SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...
Hi SubstantialElk6
Yes this is the queue the glue will pull jobs from and push into the k8s. You can create a new queue from the UI (go to the workers&queues page and to the Queue Tab and press on "create new" Ignore it 🙂 this is if you are using config maps and need TCP routing to your pods As you noted this is basically all the arguments you need to pass for (2). Ignore them for the time being This is the k8s overrides to use if launching the k8s job with kubectl (basically --override...
Actually it hasn't changed ...
python k8s_glue_example.py --help
To get all the commands for configurations
You should probably pass a few :)
Can you run the entire thing on your own machine (just making sure it doesn't give this odd error) ?
I think it was just pushed, including nested call you have to use the new argument for the decorator, helper_function
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L2392
In the main pipeline I want to work with the secondary pipeline and other functions decorated with
PipelineDecorator
. Does ClearMl allow this? I have not been able to get it to work.
Usually when we think about pipelines or pipelines, the nested pipeline is just another Task you are running in the DAG (where the target queue is the services
queue).
When you say nested pipelines with decorators, what exactly do you have in mind ?
Hi SubstantialElk6
I'm not sure what you are asking 🙂
Basically the clearml-agent
will pull a Task from an execution queue, and execute it (based on the definition on the Task, i.e. git repo, python packages docker image etc.)
Just to get the full picture, are we expecting to see the newly created step (aka eager execution) on the original pipeline (i.e. as part od the DAG visualization) ?
GiganticTurtle0
I'm assuming here that self.dask_client.map(read_and_process_file, filepaths)
actually does the multi process/node processing. The way it needs to work, it has to store the current state of the process and then restore it on any remote node/process. In practice this means pickling the local variables (Task included).
First I would try to use a standalone static function for the map, DASK might be able to deduce it does not need to pickle anything, as it is standalone.
A...
Are you suggesting just taking the
read_and_process_file
function out of the
read_dataset
method,
Yes 🙂
As for the second option, you mean create the task in the
init
method of the NetCDFReader class?
correct
It would be a great idea to make the Task picklelizable,
Adding that to the next version to do list 😉
Only those components that are imported in the script where the pipeline is defined would be included in the DAG plot, is that right?
Actually the way it works currently (and we might change it if there is a better way), every time you call PipelineDecorator.component
a new component is stored on the Pipeline Task, which is later translated into DaG graph and Table (next version will have a very nice UI to display / edit them).
The idea is first to have a representation of the p...
However, are you thinking of including this callbacks features in the new pipelines as well?
Can you see a good use case ? (I mean the infrastructure supports it, but sometimes too many arguments is just confusing, no?!)
So if any step corresponding to 'inference_orchestrator_1' fails, then 'inference_orchestrator_2' keeps running.
GiganticTurtle0 I'm not sure it makes sense to halt the entire pipeline if one step fails.
That said, how about using the post_execution callback, then check if the step failed, you could stop the entire pipeline (and any running steps), what do you think?
The new parameter
abort_on_failed_steps
could be a list containing the name of the
I like that, we can also have it as an argument per step (i.e. the decorator can say, abort_pipeline_on_fail or continue_pipeline_processing)
GiganticTurtle0 the fix was not applied in 1.1.2 (which was a hot fix after pyjwt interface changed and broke compatibility)
The type hint fix it on the latest RC:pip install clearml==1.1.3rc0
I just verified with your example
apologies for the confusion, we will release 1.1.3 soon (we just need to make sure all tests pass with a few PRs that were merged)
Hi JitteryCoyote63 report_frequency_sec=30.
controller how frequently monitoring events are sent to the server, default is every 30 seconds (you can change the UI display to wall-time to review). You can change it to 180 so it will only send an event every 3 minutes (for example).
sample_frequency_per_sec is the sampling frequency it uses internally, then it will average the results over the course of the report_frequency_sec
time window, and send the averaged result on the repo...
Hi SuperiorDucks36
you have such a great and clear GUI
😊
I personally would love to do it with a CLI
Actually a lot of stuff are harder to get from UI (like current state of your local repository etc.) But I think your point stands 🙂 We will start with CLI, because it is faster to deploy/iterate, then when you guys say this is a winner we will have a wizard in the UI.
What do you think?
StorageHelper is used internally.
I'll make sure we remove it from the examples/docs
So I suppose clearml-agent is not responsible, because it finds a wheel for torch 1.11.0 with cu117.
The thing is, the agent used to do all the heavy parsing because pytorch never actually had a pip compatible artifactory
But now they do, so the agent basically passed the parsing to pip and just added the correct additional pytorch pip repo.
It seems we need to switch back... wdyt?
Hi UnsightlySeagull42
Basically you can get the agent to always add additional arguments for the docker run, such as -v for mounting:
https://github.com/allegroai/clearml-agent/blob/948fc4c6ce1ecf33a74619ad570d69b8188f6db9/docs/clearml.conf#L133
This is already part of the docker-compose file,
https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml
@<1541954607595393024:profile|BattyCrocodile47> first let me say I ❤ the dark theme you have going on there, we should definitly add that 🙂
When I run
python set_triggers.py; python basic_task.py
, they seem to execute, b
Seems like you forgot to start the trigger, i.e.
None
(this will cause the entire script of the trigger inc...
Hi @<1541954607595393024:profile|BattyCrocodile47>
Can you trigger a pre-existing Pipeline via the ClearML REST API?
Yes
'd want to have a Lambda function trigger the Pipeline for a batch without needing to have all the Pipeline code in the lambda function.
Easiest is to use clearml SDK, which basically is clone / enqueue (notice that pipeline is also a kind of a Task). See here: [None](https://github.com/allegroai/clearml/blob/3ca6900c583af7bec18792a4a92592b94ae80cac/example...
Hi @<1541954607595393024:profile|BattyCrocodile47>
is this on your self hosted machine ?
do you have your Task.init
call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?
Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer?
Hmm in theory, but not in practice 😞
if ClearML is following OAuth 2.0, t
This is for the SSO part, not for the API, API is only using JWT for verification, the login process itself is with external SSO (OAuth 2.0). But the open-source version does not support SSO 😞
Why are you trying to add another ELB with JWT verification on it ? ...