Hi GrievingTurkey78
I'm assuming similar to https://github.com/pallets/click/
?
Auto connect and store/override all the parameters?
can configuration objects refer to one-another internally in ClearML?
Interesting, please explain?
We abuse the object description here to store the desired file path.
LOL, yep that would work, I'm assuming you have some infrastructure library that does this hack for you, but really cool way around it π
And last but not least, for dictionary for example, it would be really cool if one could do:
Hmm what you will end up now is the following behaviour,my_other_config['bar']
will hold a copy of my_config
, if you clone the Task and change "my_config" it will hav...
GrotesqueDog77 when you say "the second issue" , do you mean the fact that both step 1 and step 2 should have access to the same filesystem?
I see, would having this feature solve it (i.e. base docker + bash init script)?
https://github.com/allegroai/trains/issues/236
, i thought there will be some hooks for deploying where the integration with k8s was also taken care automatically.
Hi ObedientToad56
Yes you are correct, basically now you have a docker-compose (spinning everything, even though per example you can also spin a standalone container (mostly for debugging).
We are working on a k8s helm chart so the deployment is easier, it will be based on these docker-compose :
https://github.com/allegroai/clearml-serving/blob/main/docker/docker-comp...
I mean to reduce the API calls without reducing the scalars that are logged, e.g. by sending less frequent batched updates.
Understood,
In my current trials I am using up the API calls very quickly though.
Why would that happen?
The logging is already batched (meaning 1API for a bunch of stuff)
Could it be lots of console lines?
BTW you can set the flush period to 30 sec, which would automatically collectt and batch API calls
https://github.com/allegroai/clearml/blob/25df5efe7...
LuckyRabbit93 We do!!!
Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that π this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
ReassuredTiger98 regrading the agent error, can you see the package some_packge
in the "Installed Packages" in the UI? Was it installed ? are you using pip or conda as package manager in the agent (check the clearml.conf) is the agent running in docker mode ?
Hi @<1614069770586427392:profile|FlutteringFrog26>
So since you have the Task id. you do:
task = Task.get_task("task id here")
Then to get the models
models = task.models["output]
the models is a list And a dict, if you want the lats one you do last_model = models[-1]
if you know the best model name you do model = models["best model"]
(notice the model name is the exact one you see in the UI. Once you have the model object you can get a copy with `model.get_lo...
Not at all, we love ideas on improving ClearML.
I do not think there is a need to replace feast, it seems to do a lot, I'm just thinking on integrating it into the ClearML workflow. Do you have a specific use case we can start to work on? Or maybe a workflow that would make sense to implment?
Is there a way to do this all elegantly?
Of yes there is, this is how TaskB code will look:
` task = Task.init(..., 'task b')
param = {'TaskA' :'TaskAs ID HERE'}
task.connect(param)
taska_model = Task.get_task(param['TaskA']).models['output''][-1]
torch.load(taska_model.get_local_copy())
train
torch.save('modelb') `I might have missed something there, but generally speaking this will let you:
Select TASKA as a parameter of TaskB training process Will register automagically Tasks'A...
(as i see the services worker is only in the services-queue, and not my default queue (where my other servers/workers are)
So basically the service-mode is just a flag passed to the agent, and the services queue is the name of the queue it will pull from.
If i want a normal worker also
You can just add another section to the docker-compose, or run it manually after you spin the docker-compose.
LazyFox65 wdyt ?
Thanks JuicyFox94 for letting us know.
I'm checking what's the status with it
Hi RobustGoldfish9 ,
I'd much rather just have trains-agent just automatically build the image defined there than have to build the image separately and make it available for all the agents to pull.
Do you mean there is no docker image in the artifactory built based on your Dockerfile ?
You can disable it with:
Task.init('example', 'train', auto_connect_frameworks={'pytorch': False})
WittyOwl57 that is odd there is a specific catch for SystemExit
https://github.com/allegroai/clearml/blob/51d70efbffa87aa41b46c2024918bf4c584f29cf/clearml/backend_interface/task/repo/scriptinfo.py#L773
How do I reproduce this issue/warning ?
Also: "Repository and package analysis timed out (300.0 sec), giving up" seriously ove 5 minutes ?! how large is the git repo?
RobustGoldfish9 I see.
So in theory spinning an experiment on an gent would be clone code -> build docker -> mount code -> execute code inside docker?
(no need for requirements etc.?)
MelancholyBeetle72 thanks! I'll see if we could release an RC with a fix soon, for you to test :)
Actually it would be interesting to combine the two, feast is fully open-source and supported by the linux foundation, so I cannot see the harm in that.
wdyt?
Hi SmoggyGoat53
What do you mean by "feature store" ? (These days the definition is quite broad, hence my question)
Whatβs the general pattern for running a pipeline - train model, evaluate metrics and publish the model if satisfactory (based on a threshold, for example)
Basically I would do:
parameters for pipeline:
TaskA = Training model Task (think of it as our template Task)
Metric = title/series/sign we want to choose based on, where sign is max/min
Project = Project to compare the performance so that we could decide to publish based on the best Metric.
Pipeline:
Clone TaskA Change TaskA argu...
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
This is strange, let me see if we can get around it, because I'm sure it worked π
Hi FancyWhale93 , in your clear.conf configure default output uri, you can specify the file server as default, or any object storage:
https://github.com/allegroai/clearml-agent/blob/9054ea37c2ef9152f8eca18ee4173893784c5f95/docs/clearml.conf#L409
Hi GiganticTurtle0
ClearML will only list the directly imported packaged (not their requirements), meaning in your case it will only list "tf_funcs" (which you imported).
But I do not think there is a package named "tf_funcs" right ?