
Reputation
Badges 1
25 × Eureka!Exactly π
If you feel like PR-ing a fix, it will be greatly appreciated π
Hi ReassuredOwl55
a dialogue box that opens with a βdeletingβ bar swishing across, but then it just hangs, and becomes completely unresponsive
I believe this issue was fixed in the latest server version, seems like you are running 1.7 but the latest is 1.9.2. May I suggest an upgrade ?
Aws autoscaler will work with iam rules along as you have it configured on the machine itself. Sagemaker job scheduling (I'm assuming this is what you are referring to, and not the notebook) you need to select the instance as well (basically the same as ec2). What do you mean by using the k8s glue, like inherit and implement the same mechanism but for sagemaker I stead of kubectl ?
Hi SquareFish25
Sure, here are a few:
HPO
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Pipeline
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
Automation:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
After removing the task.connect lines, it encountered another error related to 'einops' that is not recognized. It does exist on my environment file but was not installed by the agent (according to what I see on 'Summary - installed python packages'. should I add this manually?
Yes, I'm assuming this is a derivative package that is needed by one of your packages?
Task.add_requirements("einops")
task = Task.init(...)
Hmm, it seems as if the task.set_initial_iteration(0) is ignored...
What's the clearml version you are using ?
Is it the same one you have on the local machine ?
Exactly, just pointing to the fact that, that machine is yours ;)
do you suggest to delete those first?
it might make it easier on the server (I think there is some bug there when it deleted many tasks it tries to parallelize the delete process, but fails to properly sync, anyhow this is fixed and will be pushed with the next clearml-server version)
Hi @<1524922424720625664:profile|TartLeopard58>
- Opened container ports for VS Code, JupyterLab, and SSH.I think that by default it uses the host network so it can take care of that, are you saying you added k8s integration ?
- Added NodePort to the service to directly access via public IP:NodePort (previously only SSH was available, but now NodePort is added for VS Code and JupyterLab as well), allowing direct access without SSH tunneling.Interesting!
- Considering security vulnerabilitie...
Maybe WackyRabbit7 is a better approach as you will get a new object (instead of the runtime copy that is being used)
AntsyElk37
and when i try to use --output-uri i can't pass true because obviously i can't pass a boolean only strings
hmm, that sounds right, I think we should fix that so when using --output-uri true
the value that is passed is actually True, not the string "true".
Regrading the issue itself:
are you saying --skip-task-init
is being ignored ? and it always adds the Task.init call? you can also pass --output-uri
https://files.clear.ml (which is the same as True) ,...
GrievingTurkey78 did you open the 8008 / 8080 / 8081 ports on your GCP instance (I have to admit I can't remember where exactly in the admin panel you do that, but I can assure you it is there :)
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one
also in the global pipeline script (not just inside the function)
Hi @<1541954607595393024:profile|BattyCrocodile47>
see here: None
Try with app.clearml.mlops-club.org
and the rest of them
Assuming git repo looks something like:.git readme.txt module | +---- script.py
The working directory should be "."
The script path should be: "-m module.scipt"
And under the Configuration/Args, you should have:args1 = value args2 = another_value
Make sense?
My task starts up and checks the mounted EFS volume for x data, if x data does not exist there, it then pulls x data from S3.
BoredHedgehog47 you can just use StorageManager and configure clearml cache for the EFS, it will essentially do the same π
Regrading helm chart with EFS,
you need to configure the clearml-glue pod template with the EFS mount
example :
https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/e7f647f4e6fc76f983d61522e635353005f1472f/examples/kubernetes/volu...
ok so i accidentally (probably with luck) noticed the max_connection: 2 in the azure.storage config.
NICE!!!! π
But wait where is that set?
None
Should we change the default or add a comment ?
and this path should follow linux folder structure not a single file like the current .zip.
I like where this is going π
So are we thinking like a "shared" folder where the data is kept "warm" and a single source of truth where the packaged zip file is stored (like object storage, e.g. S3)
JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)
If you have idea on where to start looking for a quick win, I'm open to suggestions π
I ended up using
task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Exactly, this will initialize and auto log the current process into existing task (task_id). Without the argument continue_last_task ` it will just create a new Task and auto log everything to it π
Hi @<1541592204353474560:profile|GhastlySeaurchin98>
During our first large hyperpameter run, we have noticed that there are some tasks that get aborted with the following console log:
This looks like the HPO algorithm doing early stopping, which algo are you using ?
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...