Hi CheekyElephant36
First you need to run it once on your machine, once this is done (only a few steps is enough), you can one it and enqueue it. Then to actually connect the aws autoscaler (the part that spins machines and runs tasks) go to applications and select the aqs autoscaler.
Btw i think the next video will be about YOLO + autoscaler
Hi CleanPigeon16
Put the specific git into the "installed packages" section
It should look like:... git+
...
(No need for the specific commit, you can just take the latest)
Hmm can you test with the latest RC?pip install clearml==0.17.6rc1
So why is it trying to upload to "//:8081/files_server:" ?
What do you have in the trains.conf on the machine running the experiment ?
And can I store models with no attachment to tasks?
Assuming you have the Model ID :model = InputModel(model_id='aabbcc') local_file_or_folder = model.get_weights()
Is this what you are looking for?
I can't think of any actual difference in flow ...
Can you try the following?task._setup_reporter() task.set_initial_iteration(0)
I don't have the compose file, or at least can't seem to find it in
/opt
you can manually take down all dockers with:docker ps
then docker stop <container id>
for each container id
PanickyMoth78 quick update the fix is already being tested, I'm hoping an RC tomorrow 🙂
Hi WickedGoat98
This sounds like a great design (obviously you have scale in mind 😉 ) Feel free to ask "stupid" questions, based on what you already wrote I doubt they will be
A few questions that come to mind (probably a few others after):
You mentioned FS synchronization, from where? i.e. what is the single source of truth ? K8s (Rancher 2.0 is basically k8s manager) can take care of mounting volumes, so no need to sync, is this a valid solution ?
BTW : (you can drag and drop an i...
can I mount the s3 bucket as file system on place where
you need to mount it where the file server is storing it's files, correct (notice, not the DBs, just the files server)
SparklingElephant70 , let me make sure I understand, the idea is to make sure the pipeline will launch a specific commit/branch, and that you can control it? Also are you using the pipeline add_step
function or are you decorating a function with PipelineDecorator ?
CooperativeFox72 a bit of info on how it works:
In "manual" execution (i.e. without an agent)
path = task.connect_configuration(local_path, name=name
path = local_path , and the content of local_path is stored on the Task
In "remote" execution (i.e. agent)
path = task.connect_configuration(local_path, name=name
"local_path" is ignored, path is a temp file, and the content of the temp file is the content that is stored (or edited) on the Task configuration.
Make sense ?
Hi RoundMosquito25
This is a bit old but probably a good start:
https://clear.ml/blog/stacking-up-against-the-competition/
tl;dr
ClearML advantages (at least a few I can think of)
Scales way better Enables out of the box experiment orchestration (i.e. remote execution etc) Data management Nicer UI Full RestAPI Full MLops platform Model serving Query-able model repositoryProbably more 🙂
eval
built-in. wdyt?
eval
is never recommended as basically you could do Args/float='os.system("rm ...")'
🙂
In theory type is stored on the hyper parameter (this is a relatively new feature the backend supports)
The casting though, is done based on the Original value type, which means Task.connect needs to be called with the original dict. Is there a specific reason for using get_parameters instead of task.connect ?
. However, despite having imported the required types from the
typing
library in the script where the function decorated with
PipelineDecorator.component
is defined, later in the generated script the
typing
library is not imported outside the scope of the function
Actually the typing part is not passed to the "created step" , because there are no global imports, for eexample:
` def step(a: pd.DataFrame):
import pandas as pd
...
I want to inject a bash command after the repo has been clone (and maybe even after the venv has been installed).
LazyTurkey38 the created venv inherits from the system environment, so in theory you can do all the installation on the system python and the created venv will just inherit the packages, no?
(btw: just to clarify, there is only one entry point for the custom bash script and that is before everything, so users can configure the container before the agent starts)
Sure. JitteryCoyote63 so what was the problem? can we fix something?
Hi AdventurousWalrus90
Thank you for the kind words! 😊
/home/usr_338436_ulta_com/.clearml/venvs-builds/3.7/.gitignore
so this is the error on the agent ?
using caching where specified but the pipeline page doesn't show anything at all.
What do you mean by " the pipeline page doesn't show anything at all."? are you running the pipeline ? how ?
Notice PipelineDecorator.component needs to be Top level not nested inside the pipeline logic, like in the original example
@PipelineDecorator.component(
cache=True,
name=f'append_string_{x}',
)
LazyTurkey38
The last part makes sense, not sure I get the "if clone", we are calling execute_remotely, so I'm assuming we do not need to clone ourselves, but send the current Task.
Other than that yes, makes sense (BTW, assuming you have upgraded the server >=1.0 you can just do mark_stopped, no need to reset
Maybe there is setting in docker to move the space used in a different location?
No that I know of...
I can simply increase the storage of the first disk, no problem with that
probably the easiest 🙂
But as you described
it looks like an edge case, so I don’t mind
🙂
Very odd, I still can't reproduce. This is just the cleanup service running without anything else ?
What's the clearml version it is using ?
@<1699955693882183680:profile|UpsetSeaturtle37> can you try with the latest clearml-session (0.14.0) I remember a few improvements there
The remote machine is in Azure behind the load-balancer, we are using docker images, so directly connecting to pods.
yeah LB in the middle might be introducing SSH hiccups, first upgrade to the latest clearml-session it better ocnfigures the SSH client/server to support longer timeout connection, if that does not work try the -- keepalive=true
Le...
Quick update, I found the issue, working on a fix 🙂
Hi @<1541229818828296192:profile|HurtHedgehog47>
plots we create in the notebook are not saved as it was made.
I'm assuming these are matplotlib plots ?
Notice that ClearML tries to convert the plot into interactive plots, in that process sometimes, colors and legend is being lost (becomes generic).
You can however manually report the plot, and force it to store it as non-interactive:
task.logger.report_matplotlib_figure(
title="Manual Reporting", series="Just a plot", ite...
Hi @<1556450111259676672:profile|PlainSeaurchin97>
While testing the migration, we found that all of our models had their
MODEL URL
set to the IP of the old server.
Yes all the artifacts/models/debug-samples are stored "as is" , this means that if you configured your original setup with IP, it is kind of stuck there, this is why it is always preferred to use host-name ...
you apparently also need to rename
all
model URLs
Yes 😞
And the agent continue running.
oh just kill al the processes with clearml-agent
in the cmd line
pkill -9 -f clearml-agent