Wait IrritableOwl63 this looks like ti worked, am I right ? huggingface was correctly installed
Can you test with the credentials also in the global section
None
key: "************"
secret: "********************"
Also what's the clearml python package version
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
On debugging, it looks like indices are corrupt.
ishhhhh, any chance you have a backup?
If this is how the repo links look like, do not set anything in the clearml.conf
It "should" use the ssh for the ssh links, and http for the http links.
It appears that "they sell that" as Triton Management Service, part of
. It is possible to do through their API, but would need to be explicit.
We support that, but this is Not dynamically loaded, this is just removing and adding models, this does not unload them from the GRAM.
That's the main issue. when we unload the model, it is unloaded, to do dynamic, they need to be able to save it in RAM and unload it from GRAM, that's the feature that is missing on all Triton deployme...
We do upload the final model manually.
If this is the case just name it based on the parameters, no? am I missing soemthing?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/model.py#L1229
I was just wondering if i can make the autologging usable.
It kind of assumes these are different "checkpoints" on the same experiment, and then stores them based on the file name
You can however change the model names later:
` Task.current_task().mo...
So this can be translated to
CLEARML__SDK__AZURE__STORAGE__CONTAINERS__0__ACOUNT_NAME=abcd
WackyRabbit7 just making sure I understand:MedianPredictionCollector.process_results Is called after the pipeline is completed.
Then inside the function, Task.current_task() returns None.
Is this correct?
Hi @<1539055479878062080:profile|FranticLobster21>
Like this?
https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[…]ation/hyper-parameter-optimization/hyper_parameter_optimizer.py
[https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[…]ation/hyper-parameter-opt...
I'm wondering why this is the case as docker best practices does indicate we should use a non root on production images.
The docker image for the service-agent is not root ?
using caching where specified but the pipeline page doesn't show anything at all.
What do you mean by " the pipeline page doesn't show anything at all."? are you running the pipeline ? how ?
Notice PipelineDecorator.component needs to be Top level not nested inside the pipeline logic, like in the original example
@PipelineDecorator.component(
cache=True,
name=f'append_string_{x}',
)
Hi WittyOwl57
I'm guessing clearml is trying to unify the histograms for each iteration, but the result is in this case not useful.
I think you are correct, the TB histograms are actually a 3d histograms (i.e. 2d histograms over time, which would be the default for kernel;/bias etc.)
is there a way to ungroup the result by iteration, and, is it possible to group it by something else (e.g. the tags of the two plots displayed below side by side).
Can you provide a toy example...
Task.debug_simulate_remote_task simulates the Task being executed by the agent (basically same behaviour, only local). the argument it gets is the Task ID (string).
The to see how it works is to run the code once (no debug_simulate call), get the Task ID we created, then rerun with the debug_simulate_remote_task passing the previous Task.ID
Make sense ?
Hi SarcasticSparrow10
which database services are used to...
Mongo & Elastic
You can query everything using ClearML interface, or talk directly with the databases.
Full RestAPI is here:
https://clear.ml/docs/latest/docs/references/api/endpoints
You can use the APIClient for easier pythonic interface:
See example here
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py
What is the exact use case you have in mind?
can you tell me what the serving example is in terms of the explanation above and what the triton serving engine is,
Great idea!
This line actually creates the control Task (2)clearml-serving triton --project "serving" --name "serving example"
This line configures the control Task (the idea is that you can do that even when the control Task is already running, but in this case it is still in draft mode).
Notice the actual model serving configuration is already stored on the crea...
Hi @<1798887590519115776:profile|BlushingElephant78>
have multiple steps that run inside a gitlab repo
one thing to make sure that is not missed, "inside a gitlab repo" , notice the actual pipeline is running on agents/workers/k8s gitlab will be the trigger and monitor it, but the actual execution should probably happen somewhere with proper compute
this works fine when i run the pipeline from the script directly but when i run it from the web interface
try to configure your...
Task status change to "completed" is set after all artifacts upload is completed.
JitteryCoyote63 that seems like the correct behavior for your scenario
try to break it into parts and understand what produces the error
for example:increase(test12_model_custom:Glucose_bucket[1m])increase(test12_model_custom:Glucose_sum[1m])increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m])
and so on
Yes there was a bug that it was always cached, just upgrade the clearmlpip install git+
Hi @<1601023807399661568:profile|PompousSpider11>
Yes "activating" a conda/python environment in a docker is more complicated then it should be ...
To debug, what are you getting when you do:
docker run -it <docker name here> bash -c "set"
that using a “local” package is not supported
I see, I think the issue is actually pulling the git repo of the second local package, is that correct ?
(assuming you add the requirement manually, with Task.add_requirements) , is that correct ?
Hi SmugSnake6
I think it was just fixed, let me check if the latest RC includes the fix
Hi RattyBat71
Do you tend to create separate experiments for each fold?
If you really want to parallelized the workload, then splitting it to multiple executions (i.e. passing an argument of the index of the same CV) makes sense, then you can compare / sort the results based on a specific metric. That said if speed is not important, just having a single script with multiple CVs might be easier to implement?!
Can you see the repo itself ? the commit id ?
Hi @<1625303806923247616:profile|ItchyCow80>
Could you add some prints ? Is it working without the Task.init call? the code looks okay and the - No repository found, message basically says it logs it as a standalone script (which makes sense)
GiddyTurkey39
A flag would be really cool, just in case if theres any problem with the package analysis.
Trying to think if this is a system wide flag (i.e. trains.conf) or a flag in task.init.
What do you think?
Hmm I just noticed:
'--rm', '', 'bash'
This is odd this is an extra argument passed as "empty text" how did that end up there? could it be you did not provide any docker image or default docker container?
This is an example of hoe one can clone an experiment and change it from code:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
A full HPO optimization process (basically the same idea only with optimization algorithms deciding on the next set of parameters) is also available:
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Hi GrievingTurkey78
the artifacts are downloaded to the cache folder (and by default the last 100 accessed artifacts are maintained there).
node executes the task all the info will be erased or does this have to be done explicitly?
Are you referring to the trains-agent running a docker?
By default the cache is persistent between execution (i.e. saving time on multiple downloads between experiments)
Hmm yeah I think that makes sense. Can you post here the arguments?
I'm assuming you have something like '1.23a' in the arguments?