Hi @<1600661423610925056:profile|StrongMouse81>
using serving base url and also other endpoint of model we add using:
clearml-serving model add
we get the attached respond:
And other model endpoints are working for you?
DrabSwan66
Did you set "docker_install_opencv_libs: true" in your clearml.conf on the host machine ?
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docs/clearml.conf#L150
Just making sure, you are running clearml-agent in docker mode, correct?
What's the container you are using ?
I am symlinking the .clearml directory to a NAS server and this is perhaps part of the problem.
Yep, that sounds about right, it uses Posix file system for internal lock mechanisms (multi process locks), and my guess is that the NAS for some reason does not support it...
basically @<1554638166823014400:profile|ExuberantBat24> you can think of hyper-datasets as a "feature-store for unstructured data"
Okay here is a standalone code that should be close enough? (if I missed anything let me know)
` import tempfile
from datetime import datetime
from pathlib import Path
import tensorflow as tf
import tensorflow_datasets as tfds
from clearml import Task
task = Task.init(project_name="debug", task_name="test")
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, labe...
I basically moved the Task.init() call below the imports
Okay that is odd, can you copy pate the before/after of the import, so we can fix that?!
BoredHedgehog47 can you provide some logs, this is odd..
Hi MammothGoat53
Basically what you are missing are the headers with the Token you have:
https://blog.logrocket.com/secure-rest-api-jwt-authentication/
BoredHedgehog47 could it be "python" python points to python 2.7 inside your container, as opposed to python3 on your machine
(this error is python2 trying to run python 3 code)
https://stackoverflow.com/questions/20555517/using-multiple-versions-of-python"Training classifier with command:\n python -m sfi.imagery.models.bbox_predictorv2.train
Hi EnviousStarfish54
docker on windows , with nvidia runtime support is only with WSL (I think)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-wip
https://medium.com/@dalgibbard/docker-with-gpu-support-in-wsl2-ebbc94251cf5
Ohh I see, so basically the ASG should check if the agent is Idle, rather than the Task is running ?
Notice that if you are using TB, everything you report to the TB will appear as well π
The thing I don't understand is how come this DOES work on our linux setups
I do not think it actually works... I could not have find a code that will convert the ENV in the config string ...
I'll be happy to test it out if there's any commit available?
Please do, and feel free to PR it π
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L576
https://github.com/allegroai/clearml/blob/d3e98639...
Maybe something similar to dockers
I like this approach maybe we could add --name as well, so it is easier to name them.trains-agent daemon stop --gpus all
trains-agent daemon stop --cpu-only
trains-agent daemon stop --gpus 0
What do you think?
Also being able to separate their configurations files would be good (maybe there is and I don't know?)
This is already supported --config-file
, see trains-agent --help
for details π
Hi FancyChicken53
This is a noble cause you are after π
Could you be more specific on what you had in mind, I'll try to find the best example once I have more understanding ...
Hi BitterStarfish58
What's the clearml version you are using ?
dataset upload both work fine
Artifacts / Datasets are uploaded correctly ?
Can you test if it works if you change " http://files.community.clear.ml " to " http://files.clear.ml " ?
I think the ClearmlLogger is kind of deprecated ...
Basically all you need is Task.init at the beginning , the default tensorboard logger will be caught by clearml
Please hit Ctrl-F5 refresh the entire page, see if it is till empty....
JitteryCoyote63 I think this one:
https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py
Hi DilapidatedDucks58
how to force-reinstall package from github in Installed Packages
You mean make sure that the agent installs it from github?
The "Installed packages" section is equivalent to "requirements.txt" anything you can put in requirements.txt, you can put there.
For example adding to "Installed Packages"git+
Will make sure you install the latest clearml from GitHub.
Notice that you cannot have two packages with the same name (just like with regular requirements.txt)...
DilapidatedDucks58
is there any way to post Slack alerts for the frozen experiments?
The latest RC should solve the PyTorch data loader, do you want to test it?pip install clearml==0.17.5rc2
Hi SmugDog62
My guess is that there's an issue with the git repo detector.
Seems like you are correct
Can are you getting on the execution tab?
Is the repo correct?
Do you see the notebook in the uncommited changes ?
The notebook path goes through a symlink a few levels up the file system (before hitting the repo root, though)
Hmm sounds interesting, how can I reproduce it?
The notebook kernel is also not the default kernel,
What do you mean?
We already have the feature-store to save all data, thatβs why I donβt need to save it (just a reference of version of dataset).
that makes sense, so why don't you point to the feature store ?
I can have different steps of the pipeline running on different machines. But this is not my use case.
if they are running on the same machine you can basically return a path to the local storage or change the output_uri to the local storage, this will cause them to get serialized to the l...
Is this from the pipeline logic? Or a component?