Reputation
Badges 1
25 × Eureka!SubstantialElk6 I just executed it , and everything seems okay on my machine.
Could you pull the latest clearml-agent from the github and try again ?
EDIT:
just try to run:git clone cd clearml-agent python examples/k8s_glue_example.py
Hi CheerfulGorilla72 ,
Sure there are:
https://github.com/allegroai/clearml/tree/master/examples/frameworks/pytorch-lightning
If I install using
pip install -r ./requirements.txt
then pip installs the packages in the order of the requirements file.
Actually this is not how it works, pip will install in any way it sees fit, and it is not consistent between versions (it has to do with dependency resolving)
However, during the installation process from ClearML, it installs the packages in order UNLESS there's a custom path provided, then it's saved for last
Correct because the custom (I...
Hi @<1523706266315132928:profile|DefiantHippopotamus88>
The idea is that clearml-server acts as a control plane and can sit on a different machine, obviously you can run both on the same machine for testing. Specifically it looks like the clearml-sering is not configured correctly as the error points to issue with initial handshake/login between the triton containers and the clearml-server. How did you configure the clearml-serving docker compose?
I'll make sure we have conda ignore git:// packages, and pass them to the second pip stage.
Hi SubstantialElk6
try:--docker "<image_name> --privileged"Notice the quotes
Let me check, which helm chart are you referring to ?
Sounds good, I assumed that was the case but I was not sure.
Let's make sure that in the clearml.conf we write it in the comment above the use_credentials_chain option, so that when users look for IAM roles configuration they can quick search for it 🙂
Hi LazyTurkey38
, is it possible to have the agents keep a local version and only download the diff of the job commit to speed things up?
This is what it does, it has a local cached copy and it only pulls the latest changes
Hi MagnificentSeaurchin79
Yes this is a bit confusing 🙂
Datasets are stored as delta changes from parent versions.
A dataset contains a list of files and list of artifacts where these files exist. This means that if we add a new file to a dataset we create a new dataset from a parent dataset and want to add a file, we have to add a link to the file, and have a new artifact containing just the delta (i.e. the new file) from the parent version When you delete a file you just remove the li...
Hi SarcasticSparrow10 ,
So the bad news is the UI is actually escaping the query, so you cannot search regexp from the UI. The good news, you can do achieve that from python:from trains import Task tasks = Task._query_tasks(task_name='exp.*i1')
Hi CheerfulGorilla72
the "installed packages" section is used as "requirements.txt for the agent.
Are you saying the autodetection fails to detect all packages? You can specify in "manual execution" (i.e not when the agent is running the code), to just take the requirements.txt locally:` Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
notice the above call should be executed Before Task.init
task = Task.init(...) `3. If you clear all the "installed packages" se...
But once i see it on the UI means it is already launched somewhere so i didn't quite get you.
The idea is you run it locally once (think debugging your code, or testing it)
While running the code the Task is automatically created, then once in the system you can clone / launch it.
Also, I want to launch my experiments on a kubernetes cluster and i don't actually have any docs on how to do that, so an example can be helpful here.
We are working on documenting the full process, ...
Hi @<1541229818828296192:profile|HurtHedgehog47>
plots we create in the notebook are not saved as it was made.
I'm assuming these are matplotlib plots ?
Notice that ClearML tries to convert the plot into interactive plots, in that process sometimes, colors and legend is being lost (becomes generic).
You can however manually report the plot, and force it to store it as non-interactive:
task.logger.report_matplotlib_figure(
title="Manual Reporting", series="Just a plot", ite...
This should work:
from clearml import Task
task = Task.init(project_name="examples", task_name="shap example")
import xgboost
import shap
# train an XGBoost model
X, y = shap.datasets.california()
model = xgboost.XGBRegressor().fit(X, y)
# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn, transformers, Spark, etc.)
explainer = shap.Explainer(model)
shap_values = explainer(X)
# visualize the first prediction's explanation
shap.plots...
I see TightElk12
You can always setup the OS environments : CLEARML_API_HOST CLEARML_WEB_HOST CLEARML_FILES_HOST with the correct configuration Or you can simply set CLEARML_NO_DEFAULT_SERVER=1 which will prevent any usage of the default demo serverwdyt?
I’d definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too
That makes sense, any chance you can open a github issue with feature request so that we do not forget ?
The current implementation will upload the result of the first component, and then the first thing the next component will do is download it.
If they are on the same machine, it should be cached when accessed the 2nd time
Wouldn’t it be more performant f...
The downstream stages are rankN scripts, they are waiting for the IP address of the first stage.
Is this like a multi-node training, rather than a pipeline ?
Thanks BattyLion34 I fixed the code snippet :)
So I assume, trains assumes I have nvidia-docker installed on the agent machine?
docker + nvidia-docker-runtime are assumed to be installed
nvidia/cuda docaker image is pulled when requested (like any other container image)
Moreover, since I'm going to use
Task.execute_remotely(and not through the UI) is there any code way to specify the docker image to be used?
Sure, task.set_base_docker(docker_cmd='nvidia/cuda -v /mnt:/tmp')
Notice that you can not only pass the dock...
It is deployed on an on premise, secured network that has no access to the outside world.
Is it password protected or something of that nature?
Perhaps we could find a different solution or work around, rather than solving a technical issue.
Solving it means allowing the python code to ask the JupyterLab server for the notebook file
However, once working with ClearML and using a venv (and not the default python kernel),
Are you saying on your specific setup (i.e. OpenShif...
restart the notebook kernel ?
Hi EnchantingWorm39
Great question!
Regrading the data management, I know the enterprise edition has full support for unstructured data, and we plan to soon have a solution for structured data as part of the open source (soon= hopefully in a month time)
Regrading model serving, I know you can integrate with TFServing or seldon with very little effort (usually the challenge is creating triggers etc, but but in most cases this is custom code anyhow 🙂 )
I do not have experience with Cortex/B...
Thanks @<1523704157695905792:profile|VivaciousBadger56> ! great work on the docstring, I also really like the extended example. Let me make sure someone merges it
It might be the file upload was broken?
instead of the one that I want or the one of the env which it is started from.
The default is the python that is used to run the agent.agent.ignore_requested_python_version = true agent.python_binary = /my/selected/python3.8
Hi JumpyDragonfly13
- is "10.19.20.15" accessible from your machine (i.e. can you ping to it)?
- Can you manually SSH to 10.19.20.15 on port 10022 ?
- Maybe we should add an option, archive components as well ...