We do upload the final model manually.
wait you said upload manually, and now you are saying "saved automatically", I'm confused.
Hi SmallDeer34
The any generally any pytorch.save(...) is logged/uploaded by clearml
automatically. specifically in your case I think the only missing one is the trainer_sate.json, which I assume is general json file, and I imagine is part of huggingface framework. You can easily upload it as additional artifact with Task.upload_artifact
wdyt?
The difference is that I want a single persistent machine, with a single persistent python script that can pull execute and report multiple tasks
So basically instead of using the agent, so simply spin a sub process ?
I think the limit is a few GB, I'm not sure, I'll have to check
And yes the oldest experiments will be deleted first (with the exception of published experiments, they will be deleted last)
Oh my bad, post 0.17.5 😞
RC will be out soon, in the meantime you can install directly from github:pip install git+
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
Hi CheekyElephant36
First you need to run it once on your machine, once this is done (only a few steps is enough), you can one it and enqueue it. Then to actually connect the aws autoscaler (the part that spins machines and runs tasks) go to applications and select the aqs autoscaler.
Btw i think the next video will be about YOLO + autoscaler
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
Will this still be considered asÂ
global site-packages
This is a pip settings, I "think" it inherits from the local user's installation, but I would actually install with "sudo pip" that will definitely be "inherited"
Exactly !
it seems like each task is setup to run on a single pod/node based on the attributes like
gpu memory
,
os
,
num of cores,
worker
BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...
Actually this is by default for any multi node training framework torch DDP / openmpi etc.
OutrageousSheep60
I found the task in the UI -
and in the
UNCOMMITTED CHANGES
execution section there is
No changes logged
This is the issue.
and then run the
session
via docker
clearml-session --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 \ --packages "clearml" "tensorflow>=2.2" "keras" \ --queue MY_QUEUE \ --verbose
Are you running the "cleamrl-session" from your machine? (i.e. not from inside a docker) ?...
Hi JuicyFox94 ,
Actually we just added that 🙂 (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
Thanks ShallowCat10 !
I'll make sure we fix it 🙂
Hi @<1556812486840160256:profile|SuccessfulRaven86>
Please notice that the clearml serving is not designed for public exposure, it lacks security layer, and is designed for easy internal deployment. If you feel you need the extra security layer I sugget either add external JWT alike authentication, or talk to the clearml people, their paid tiers include enterprise grade security on top
from clearml import TaskTypes
That will only work if you are using the latest from the GitHub, I guess the example code was modified before a stable release ...
Hi OddShrimp85
If you pass 'output_uri=True' to task init, it will upload the model automatically, or as you said manually with outputmodel class
Hi SubstantialElk6
but in terms of data provenance, its not clear how i can associate the data versions with the processes that created it.
I think DeliciousBluewhale87 ’s approach is what we are aiming for, but with code.
So using clearml-data
from CLI is basically storing/versioning of files (with differentiable based storage etc, but still).
What ou are after (I think) is in your preprocessing code using the programtic Dataset class, to create the Dataset from code, this a...
Hi SteadySeagull18
What does the intended workflow for making a "pipeline from tasks" look like?
The idea is if you have existing Tasks in the system and you want to launch them one after the other with control over inputs (or outputs of them) you can do that, without writing any custom code.
Currently, I have a script which does some
Task.create
's,
Notice that your script should do Task.init - Not Task.create, as Task create is designed to create additional ...
I'm a bit confused between the distinction / how to use these appropriately --
Task.init
does not have
repo
/
branch
args to set what code the task should be running.
It detects it automatically at run time 🙂 based on what is actually being used
My ideal is that I do exactly what
Task.create
does, but the task only goes into the pipeline section rather than making a new one in the experiments section.
Do y...
, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
A bit of how clearml-agent works (and actually on how clearml itself works).
When running manually (i.e. not executed by an agent), Task.init (and similarly task.connect etc.) will log data on the Task itself (i.e. will send arguments /parameters to the server), This includes logint the argparser for example (and any other part of the automagic or manuall connect).
When run...
Try this one 🙂HyperParameterOptimizer.start_locally(...)
https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer#start_locally
I'm trying to achieve a workflow similar to the one
You mean running everything on a single machine (manually)?
Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient
client = APIClient()
queue_ids = client.queues.get_all(name="queue_name_here")
while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?
ExcitedFish86 that said if running in docker mode you can actually pass it on a Task basis with:-e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/venv/bin/python
as an additional docker container argument on the Task "Execution" tab itself.