Hi @<1566596960691949568:profile|UpsetWalrus59>
Could it be the two experiments have the exact name ?
(I sounds like a bug in the UI, but I'm trying to make sure, and also understand how to reproduce)
What's your clearml-server version ?
Hi all! Does anyone know a solution to my issue with deploying models saved on azure on the clearml-serving docker container?
Hi NuttyCamel41
The easiest is to map the clearml.conf to both the serving and triton containers in your docker-compose.yaml (or k8s secrets) and make sure the conf file has the credentials to access the azure blob. wdyt ?
Hi @<1719524641879363584:profile|ThankfulClams64>
I am using ClearML Pro and pretty regularly I will restart an experiment and nothing will get logged to ClearML.
I use ClearML with pytorch 1.7.1, pytorch-lightning 1.2.2 and Tensorboard auto
All ClearML has the latest stable updates. (clearml 1.7.4, clearml-agent 1.7.2)
Is this still happening with the latest clearml ( clearml==1.16.3rc2
) ?
What is the TB version?
I remember a fix regrading lightining support
Also just making s...
Thanks @<1719524641879363584:profile|ThankfulClams64> having a code that can reproduce it is exactly what we need.
One thing I might have missed and is very important , what is your tensorboard package version?
ReassuredTiger98 if this user passes to the task as docker args the following, it might work:
'-e CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1'
Nothing that can't be worked around but for automation I don't think creating a TriggerScheduler with an existing name should be allowed
DangerousDragonfly8 I think I understand , basically you are saying the fact a user can create two triggers with the same name can create some confusion ?
It also sucks a bit that each TriggerScheduler will run in it's own pod in kubernetes.
Actually this depends on how you spin it, and you can actually spin a a service agents running multiple...
TrickySheep9 you mean custom containers in clearml-session for remote development ?
Meanwhile check CreateFromFunction(object).create_task_from_function(...)
It might be better suited than execute remotely for your specific workflow 🙂
Hi WickedBee96
How can I do that?
clearml-task
https://clear.ml/docs/latest/docs/apps/clearml_task#what-is-clearml-task-for
I know this way to run it in the agent only by enqueue the draft after running it on my local machine so is there another way?
Or maybe are you looking for task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
Thank you @<1719524641879363584:profile|ThankfulClams64> for opening the GI, hopefully we will be able to reproduce it and fox ot quickly
That should work 🙂
BTW, you might play around with "clearml-agent execute --id <task_id_here>"
This will basically clone the code, create a venv with the python packages, apply uncommitted changes and will run the actual code. This could be a replacement for your bash. (notice it means that you need to clone the Task in the UI, then you can Change parameters, then the run the agent manually in SLURM and it will take the params from the UI.)
I found the issue, the first run it jumps over the first day (let me check if we can quickly fix that)
Hi UnevenDolphin73
Can one compare experiments/tasks from different projects?
Yes, the easiest way is to go to the parent project ("all projects" if they have no common parent, then search for the specific Tasks (i.e. filter or using the search bar), then multi-select them.
wdyt?
DS, this way they only need to remember (and me only need to teach them where to find) one id.
Yes that's the point, this ID is the Model UID (as opposed to the Task ID), the reason I kind if "insist" on it is that the Model ID is built into the system meaning, this is how you register it, as opposed to the Task ID that somehow needs to be hacked/passed externally
TBH the main reason I went with our API is that because of the custom model loading, we need to use the "custom" framew...
Hi @<1658281099807166464:profile|SmallCamel52>
Lack of authentication in all versions of the fileserver component
Are you leaving the fileserver open to the world ?
Hi MammothGoat53
Do you mean working with RestAPI directly?
https://clear.ml/docs/latest/docs/references/api/events
, but it seems like I can only trigger a task using a Task scheduler but not a pipeline.
@<1523701132025663488:profile|SlimyElephant79> Maybe we should better state it, but Pipeline is "just" another type of Task. so triggering a Task with the Pipeline ID is essentially triggering the pipeline (do notice you need to select the "services" queue to be used so that the pipeline runs on the correct resource). Make sense ?
the first runs perfectly fine,
Just making sure, running in an agent?
the second crashes
Running inside the same container as the first one ?
Yes, that makes sense. Then you would need to use wither the AWS vault features, or the ClearML vault features ...
Hi AbruptHedgehog21
can you send the two models info page (i.e. the original and the updated one) ?
do you see the two endpoints ?
BTW: --version would add a version to the model (i.e. create a new endpoint with version "endpoint/{version}"
the services queue (where the scaler runs) will be automatically exposed to new EC2 instance?
Yes, using this extra_clearml_conf
parameter you can add configuration that will be passed to the clearml.conf
of the instances it will spin.
Now an example to the values you want to add :agent.extra_docker_arguments: ["-e", "ENV=value"]
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
wdyt?
Actually if you can send the full log of the Task that would be great
Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda
Quite hard for me to try this right
👍
How do I reproduce it ?
The task pod (experiment) started reaching out to an IP associated with malicious activity. The IP was associated with 1000+ domain names. The activity was identified in AWS guard duty with a high severity level.
BoredHedgehog47 What is the pod container itself ?
EDIT:
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
https://hub.docker.com/layers/library/ubuntu/18.04/images/sha256-d5c260797a173fe5852953656a15a9e58ba14c5306c175305b3a05e0303416db?context=explore
and then in Preprocess:
self.model = get_model(task_id=os.environ['TASK_ID'], model_name=os.environ['MODEL_NAME'])
That's the part I do not get, Models have their own entity (with UID), this is in contrast to artifacts that are only stored on Tasks.
The idea when you are registering a model with clearml-serving, you can specify the model ID, this should replace the need for the TASK_ID+model_name in your code, and the clearml-serving will basically bring it to you
Basically this fun...