Reputation
Badges 1
25 × Eureka!Yes including this. (There was a fix to an issue with trains-agent
and disabling frameworks, it is already part of 0.16.3 )
You can try callingtask._update_repository()
I'm still trying to figure out how to reproduce it...
Calling the script without the
PipelineDecorator.run_locally()
i.e. running the pipeline remotely still gives the
ModuleNotFoundError: No module named
Do you have the needed module listed on the pipeline controller Task ? (press on the details link, then go to Execution tab / "Installed Packages"
What's the jupyter / noetbook version you have?
Also from within the jupyter could you send me "sys.argv" ?
C will be submitted to a different queue and I donβt care as much
Is there a way to define βtask affinityβ in this way?
Hi RoughTiger69 ,
when you say Task affinity, you mean, I want C to be executed next to A/B ? Affinity as a concept doesn't really exist, it can be abstracted to a queue, where you have agents pulling from multiple queues. Then C can be pushed to one the the queues (in theory you might be able to programmtically control the Queue of C), wdyt?
Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.
I think that what you are looking for is multi-machine cache (which is fully supported). Basically mount an NFS/SMB folder from a NAS to any of those machines, configure the cache folder to point to it, and not you do not need to worry about affinity ?
no?
Is there a way to group A and B into a sub-pipeline, h...
LovelyHamster1 what do you mean by "assume the permissions of a specific IAM Role" ?
In order to spin an ec2 instance (aws autoscaler) you have to have correct credentials, to pass those credentials you must create a key/secret pair to pass to the autoscaler. There is no direct support for IAM Role. Make sense ?
TrickyRaccoon92 the title
provided by write.scalars is also a representing string for the specific metric. This is more than just a title on the plot itself.
It means that this will be the name of the scalar metric (title/series combination) .
Is that your intention, or is it for viewing purpose only?
Hi ReassuredTiger98
but I would rather just define a function that returns the task directly
π
Check it out:
https://github.com/allegroai/clearml/blob/36ee3d61209e413a917d8a718fb25f389143cfa1/clearml/automation/controller.py#L205:param base_task_factory: Optional, instead of providing a pre-existing Task, provide a Callable function to create the Task (returns Task object)
I see, would having this feature solve it (i.e. base docker + bash init script)?
https://github.com/allegroai/trains/issues/236
Okay good news, there is a fix, bad news, sync to GitHub will only be tomorrow
HI BurlyRaccoon64
Yes, we did the latest clearml-agent solves the issue, please try:
'pip3 install -U --pre clearml-agent'
The pipeline stores the state of it's previous run, specifically the executed steps.
In our case the executed step was reset (I assume) so it cannot find the output model you are referring to, hence crashing
CleanPigeon16 make sense ?
yup, it's there in draft mode so I can get the latest git commit when it's used as a base task
Yes that seems to be the problem, if it is in draft mode, you have no outputs...
BTW: I tested the code you previously attached, and it showed the plot in the "Plots" section
(Tested with latest trains from GitHub)
What do you have under the "installed packages" ?
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
Wait, that makes no sense to me. The API from python and the API from the UI are getting the same data from the backend ...
What are you getting with?from clearml import Task task = Task.get_task(task_id=<put task id here>) print(task.models)
Oh that makes sense.
So now you can just get the models as dict as well (basically clearml allows you to access them both as a list, so it is easy to get the last created, and as dict so you can match the filenames)
This one will get the list of modelsprint(task.models["output"].keys())
Now you can just pick the best onemodel = task.models["output"]["epoch13-..."] my_model_file = model.get_local_copy()
Well it seems we forgot that one π I'll quickly make sure it is there.
As a quick solution (no need to upgrade)task.models["output"]._models.keys()
That might be me, let me check...
Fixed in pip install clearml==1.8.1rc0
π
Hi GiddyPeacock64
If you already have K8s setup, and are already using ClearML.
In your kubeflow Yaml:trains-agent execute --id <task_id> --full-monitoring
This will install everything your Task needs inside the docker. Just make sure that you pass the env variable setting the ClearML , see here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L127
Hi BoredSquirrel45
as of today, my required packages aren't being recognized in cloned
Are you saying you are editing the code directly in the cloned Task, then enqueue the Task an the agent does not "auto recognize" the package ?
Requested version: 2.28, Used version 1.0" for some reason
This is fine that means there is no change in that API
Hi MagnificentSeaurchin79
This sounds like a deeper bug (of a sort), I think the best approach is to open a GitHub issue with some code that can reproduce this behavior, or at least enough information so that we could try to catch the bug.
This way we will make sure it is not forgotten.
Sounds good ?
Hmm interesting, I guess once you are able to connect it with ClearML you can just clone / modify / enqueue and let users train models directly from the UI on any hardware, is that the plan ?