is there a way for me to get a link to the task execution? I want to write a message to slack, containing the URL so collaborators can click and see the progress
WackyRabbit7 Nice!
basically you can use this one:task.get_output_log_web_page()
DefeatedOstrich93 can you verify lightning actually only stored once ?
LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...
I see... In the triton pod, when you run it, it should print the combined pbtxt. Can you print both before/after ones? so that we could compare ?
I see, so basically pull a fixed set of configuration for everyone from the server.
Currently only the scale/enterprise version supports such a feature π
ZanyPig66 this should have worked, any chance you can send the full execution log (in the UI "results -> console" download full log) and attach it here? (you can also DM it so it is not public)
What's the trains-server version?
Sure :task = Task.init(..., auto_connect_arg_parser={'arg_not_to_log': False})
This will cause all argparse to automatically be logged (and later editable) with the exception of the argument arg_not_to_log
Notice that if you have --arg-something, to exclude it add to the dict arg_something': False
DepressedFox45
you can just copy/add this section π
https://github.com/allegroai/clearml-agent/blob/e43f31eb80f9399da01dc5432cdacdf81c1bd084/docs/clearml.conf#L15
In my understanding requests still go through
clearml-server
which configuration I left
DefiantHippopotamus88 actually this is Not correct.
clearml-server only acts as a control plane, no actual requests are routed to it, it is used to sync model state, stats etc. not part of the request processing flow itself.curl: (56) Recv failure: Connection reset by peer
This actually indicates 9090 port is not being listened to...
What's the final docker-compose you are usi...
Hi @<1573119962950668288:profile|ObliviousSealion5>
Hello, I don't really like the idea of providing my own github credentials to the ClearML agent. We have a local ClearML deployment.
if you own the agent, that should not be an issue,, no?
forward my SSH credentials using
ssh -A
and then starting the clearml agent?
When you are running the agent and you force git clonening with SSH, it will autmatically map the .ssh into the container for the git to use
Ba...
Hi @<1566596960691949568:profile|UpsetWalrus59>
you should call it before initializing the Task
Task.ignore_requirements("pywin32")
task = Task.init(...)
in Your Additional ClearML Configuration
(which is basically clearml.conf configuration)
Add the following:environment { GOOGLE_APPLICATION_CREDENTIALS="~/gs.cred" } files { gsc { contents: "<this is your GCP storage credentials file>" path: "~/gs.cred" } }
Reference:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L421
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a...
SillyPuppy19 I think this is a great idea, basically having the ability to have a callback function called before aborting/exiting the process.
Unfortunately today abort will give the process 2 seconds to gracefully quit and then it kills the process. It was not designed to just send an abort signal, as these will more often than not, will not actually terminate the process.
Any chance I can ask you to open a GitHub Issue and suggest the callback feature. I have a feeling a few more users ...
RoundMosquito25 are you using clearml-agent daemon --stop
or are you killing them ?
killing them basically means you loose them in the UI when they timeout, the backend does not see them for 10min so it assumes they died, when you call clearml-agent --stop they will unregister themselves and disappear immortally
btw: you can also do cron
for that:
None
@reboot sleep 60 && clearml-agent daemon ...
Hmm that should have worked ...
I'm assuming the Task itself is running on a remote agent, correct ?
Can you see the changes in the OmegaConf section ?
what happens when you pass--args overrides="['dataset.path=abcd']"
BTW:
Task.add_requirements('tensorflow', '2.2') will make sure you get the specified version π
SolidSealion72 I'm able to reproduce, hurrah!
(and a fix is already being tested, I will keep you guys updated)
Hi FunnyTurkey96
Any chance you can try to run with the latest form GitHub (i just tested your code and it seemed to work on my machine).pip install git+
There was an issue in some versions where seeborn plots were blank. Is that the case?
AntsyElk37
and when i try to use --output-uri i can't pass true because obviously i can't pass a boolean only strings
hmm, that sounds right, I think we should fix that so when using --output-uri true
the value that is passed is actually True, not the string "true".
Regrading the issue itself:
are you saying --skip-task-init
is being ignored ? and it always adds the Task.init call? you can also pass --output-uri
https://files.clear.ml (which is the same as True) ,...
MelancholyElk85 assuming we are running with clearml 1.1.1 , let's debug the pipeline and instead of pipeline start/wait/stop :
Let's do:pipeline.start_locally(run_pipeline_steps_locally=False)
Hi WittyOwl57
Are you starting a new server from scratch or is it running on previously stored data?
Well, PipelineDecorator actually allows you to do the same thing, with the same ability that is clone / modify / enqueue.
(I mean, Pipeline with tasks is also great, I just want to clarify that they have the same capabilities in this respect).
Hi BroadMole98
A bit hacky but doable πtask = Task.get_task(task_id='aabbcc') task.get_logger().report_scalar(...)
Do you think ClearML is a strong option for running event-based training and batch inference jobs in production?
(I'm assuming event-base, you mean triggered by events not streaming data, i.e. ETL etc)
I know of at least a few large organizations doing tat as we speak so I cannot see any reason not to.
Thatβd include monitoring and alerting. Iβm afraid that Metaflow will look far more compelling to our teams for that reason.
Sure, then use Metaflow. The main issue with Metaflow...
and when you remove the "." line does it work?
Maybe different API version...
What's the trains-server version?