
Reputation
Badges 1
25 × Eureka!I want to build a real time data streaming anomaly detection service with clearml-serving
Oh, so the way it currently works clearml-serving will push the data in real-time into Prometheus (you can control the stats/input/out), then you can build the anomaly detection in grafana (for example alerts on histograms over time is out-of-the-box, and clearml creates the histograms overtime).
Would you also need access to the stats data in Prometheus ? or are you saying you need to process it ...
each epoch runs about 55 minutes, and that screenshot I posted earlier kind of show the logs for the rest of the info being output, if you wanted to check that out
I thought you disabled the stdout log. no?
Maybe ClearML is using
tensorboard
in ways that I can fine tune? I
You can open your TB and see, every report there is logged into clearml
So I assume, trains assumes I have nvidia-docker installed on the agent machine?
docker + nvidia-docker-runtime are assumed to be installed
nvidia/cuda docaker image is pulled when requested (like any other container image)
Moreover, since I'm going to use
Task.execute_remotely
(and not through the UI) is there any code way to specify the docker image to be used?
Sure, task.set_base_docker(docker_cmd='nvidia/cuda -v /mnt:/tmp')
Notice that you can not only pass the dock...
a bit sad that there is no working integration with one of the leading time series framework...
You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?
... training script was set to upload every epoch. Seems like this resulted in a torrent of metrics being uploaded.
oh that makes sense, so basically you were bombarding the server with requests, and ending with kind of denial of service
Wait, with the Port it does not work?
Notice that since this is external S3 you have to have the port specified so it Knows this is not an AWS S3 but a different compatible service
ConvolutedSealion94 if you do bash:cd ~/work/repo/code/ git status
what are you getting ?
Hi LudicrousParrot69
Not sure I follow, is this pyfunc running remotely ?
Or are you looking for interfacing with previously executed Tasks ?
That is a bit odd, But SSH keys have to have a specific chmod flags for them to work (security issues)
What was the error ?
BTW: why use CLI? the idea of clearml it becomes part of the code, even in the development process, this means add "Task.init(...)" at the beginning of the code, this creates the Tasks and logs them as part of the development. Which means that xecuting them is essentially cloning and enqueuing in the UI. Of course you can automate it directly as part of the code.
Sorry ScaryLeopard77 I missed the reply,
the tutorial in the readme of clearml-serving repo doesn't mention it though. Where should I set it?
oh dear ... you are right (I think it was there in previous versions)clearml-serving --help
https://github.com/allegroai/clearml-serving/blob/ce6ec847b1e01c6f5bf35d638e6ceb8148db8a7a/clearml_serving/main.py#L142
This is the equivalent of what is created here in the example:
https://github.com/allegroai/clearml-serving/blob/ce6ec847b...
The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).
Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?
Hi @<1689446563463565312:profile|SmallTurkey79>
This call is to set an existing (already created Task's requirements). Since it was just created it waits for the automatic package detection before overriding it.
What you want is " Task.force_requirements_env_freeze
" (notice Class level, that need to be called Before Task.init)
Task.force_requirements_env_freeze(requirements_file="requirements.txt")
task = Task.init(...)
Hmm I guess we should better state that, I'll pass it on π
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...
ElegantKangaroo44 definitely a bug, will be fixed in 0.15.1 (release in a week or so)
https://github.com/allegroai/trains/issues/140
Can I assume that if we have two agents spinning the same experiment, your code will take it from there?
Is this true ?
Let me check if we can reproduce it
Hi @<1544853695869489152:profile|NonchalantOx99>
I would assume the clearml-server configuration / access key is misconfigured in your copy of example.env
Yeah the ultimate goal I'm trying to achieve is to flexibly running tasks for example before running, could have a claim saying how many resources I can and the agent will run as soon as it find there are enough resources
Checkout Task.execute_remotely()
you can push it anywhere in your code, when execution get to it, If you are running without an agent it will stop the process and re-enqueue it to be executed remotely, on the remote machine the call itself becomes a noop,
I...
What sort of data would be stored in the
venvs-build
folder?
ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)
. Is there any known issue with amazon sagemaker and ClearML
On the contrary it actually works better on Sagemaker...
Here is what I did on sage maker, created:
created a new sagemaker instance opened jupyter notebook Started a new notebook conda_python3 / conda_py3_pytorchIn then I just did "!pip install clearml" and Task.init
Is there any difference ?
give me a minute to test
Yes the one you create manually is not really of the same "type" as the one you create online, this is why you do not see it there π
Found it, definitely a bug in the callback, it has not effect on the HPO process itself
PunyBee36 to get https add an aws elb before the server , the elb will add the https to any outside connection
GreasyPenguin14 the demo-server is soon to be deprecated, so we are slow on upgrades there. But you can already see it in the SaaS free tier.
https://app.community.clear.ml/
We use an empty queue to enqueue our tasks in, just to trigger the scheduler
it's only importance is that the experiment is not enqueued anywhere else, but the trigger then enqueues it
π
It's just that the trigger is never triggered
(Except when a new task is created - this was not the case)
Is the trigger controller running on the services queue ?
Hi @<1657918706052763648:profile|SillyRobin38>
Hi everyone, I wanted to inquire if it's possible to have some type of model unloading.
What do you mean by "unloading" ? you mean remove it from the clearml-serving endpoint ?
If this is from the clearml-serving, then yes you can online :
None
poetry
Β stores git related data in ... you get an internal package we have with its version, but no git reference, i.e.Β
internal_module==1.2.3
Β instead ofΒ
internal_module @H4dr1en
This seems like a bug with poetry (and I think I have run into this one), worth reporting it, no?