A few examples here:
None
Grafana model performance example:
browse to
login with: admin/admin
create a new dashboard
select Prometheus as data source
Add a query: 100 * increase(test_model_sklearn:_latency_bucket[1m]) / increase(test_model_sklearn:_latency_sum[1m])
Change type to heatmap, and select on the right hand-side under "Data Format" s...
I have to problem that "debug samples" are not shown anymore after running many iterations.
ReassuredTiger98 could you expand on it? What do you mean by "not shown anymore" ?
Can you see other reports ?
I think you are correct 😞 Let me make sure we add that (docstring and documentation)
I had again the same problem but within a remote pipeline setup.
Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?
We're not using a load balancer at the moment.
The easiest way is to add ELB and have amazon add the httpS on top (basically a few clicks on their console)
I failed to update the "STARTED AT" and the "COMPLETED AT" attributes in the "INFO" tab.
I'm not sure this can actually be overridden...
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
a bit sad that there is no working integration with one of the leading time series framework...
You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?
Hi GreasyPenguin14
- Did using auto_connect_frameworks={'pytorch': False} solved the issue? ( I imagine it did )
- Maybe we should have the option to have wildcard support so I will only auto log based on filename. Basically using auto_connect_frameworks={'pytorch': "model*.pt"} will only auto log the model files saved/logged , wdyt?
LazyLeopard18 could you explain some more on the specific use case you have in mind?
Btw I sometimes get a gzip error when I am accessing artefacts via the '.get()' part.
Hmm this is odd, is this a download issue? if this is reproducible maybe we should investigate further...
PricklyRaven28 did you set the iam role support in the conf?
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/docs/clearml.conf#L86
Come to think about it, maybe we should have "parallel_for" as a utility for the pipeline since this is so useful
Hi ExcitedFish86
Good question, how do you "connect" the 3 nodes? (i.e. what the framework you are using)
MelancholyElk85 assuming we are running with clearml 1.1.1 , let's debug the pipeline and instead of pipeline start/wait/stop :
Let's do:pipeline.start_locally(run_pipeline_steps_locally=False)
So I wonder - why should an agent be related to a specific user's credentials? Is the right way to go about this is to create a "fake user" for the sake of the agent?
Very true you have to have credentials for the trains-agent, so it can "report" to the trains-server, that said, the creator of the Task (i.e. the person who cloned it) will be registered as the "user" in the UI.
I would recommend to create an "agent" user and put it's credentials on the trains-agent machine (the same way...
If you have the check point (see output_uri for automatically uploading it) then you can always load it. Do you mean if you can change the iteration/ step counter? Or do you mean with trains-agent?
Funny this was mentioned just today 🙂
https://clearml.slack.com/archives/CTK20V944/p1620664770492400
Basically if you want to always ignore the "installed packages" in your clearml.conf, put:agent.package_manager.force_repo_requirements_txt = true
Noooooooooo, it is still working 🙂
The problem is, the configuration is loaded at import time, so there is no "time" to pass anything other than environment variable.
That said if the only difference is server config you can useTask.set_credentials
This is strange... Could you send the browser console log, maybe there is an exception there
Hi RotundHedgehog76
we have issues with
clearml-agent
when using standalone mode. ...
What is the use case for standalone mode? is this venv or docker mode?
Calling the script without the
PipelineDecorator.run_locally()
i.e. running the pipeline remotely still gives the
ModuleNotFoundError: No module named
Do you have the needed module listed on the pipeline controller Task ? (press on the details link, then go to Execution tab / "Installed Packages"
Hi SteadyFox10 the way it works is that Trains limits the debug image history by reusing the same files names, so the UI will only present the iterations where the debug images are relevant for. With your sample code it looks like it exposes a bug , the generated link should contain iteration number, it does not and so it overwrites the debug images every iteration. Here is the image link: https://demofiles.trains.allegro.ai/Test/test_images.6ed32a2b5a094f2da47e6967bba1ebd0/metrics/Test/te...
Hi LazyLeopard18 ,
See details below, are you using the win10 docker-compose yaml?
https://github.com/allegroai/trains-server/blob/master/docs/install_win.md
I want the model to be stored in a way that clearml-serving can recognise it as a model
Then OutputModel or task.update_output_model(...)
You have to serialize it, in a way that later your code will be able to load it.
With XGBoost, when you do model.save clearml automatically picks and uploads it for you
assuming you created the Task.init(..., output_uri=True)
You can also manually upload the model with task.update_output_model or equivalent with OutputModel class.
if you want to dis...
Hi @<1541954607595393024:profile|BattyCrocodile47>
It seems to me that instead of implementing webhooks to react to things like adding a tag to a model
Did you look at this example ?
None
Can we straightforwardly stream ALL ClearML events to another system?
what would you consider an event?
The "basic" object type is Task, a state in task is changed via an api call, would that be an e...