Ohh, two options:
From the script itself you can do:from clearml import Task task = Task.init(...) task.execute_remotely(queue='default')
Then run the script locally, it will get until the "execute_remotely call, quit the process and re-launch it on the "default" queue.
Option B:
Use the cleaml-task
$ clearml-task --folder <where the script is> --project ...
See https://github.com/allegroai/clearml/blob/master/docs/clearml-task.md#launching-a-job-from-a-local-script
Hi VirtuousFish83
Apologies for the documentation in the docs 🙂 It sounds complicated but actually should be relatively simple. Based on what I understand, you already have the server setup and you code integrated. The question is "can you see an experiment in the UI"? If you do, then you can right click it, clone the experiment , edit parameters and send for execution (enqueue). If the experiment is not in the UI you can either (1) run the code with the Task.init call, it ill automatica...
Woot woot!
awesome, this RC is stable you can feel free to use it, the official release is probably due to be out next week :)
Check on which queue the HPO puts the Tasks, and if the agent is listening to these queues
So this is why 🙂
an agent can only run one Task at a time.
The HPO (being a Task on its own) should run on the "services" queue, where the agent can run multiple "cpu controller" Tasks like the HPO.
Make sense ?
Hi OutrageousGiraffe8
I was not able to reproduce 😞
Python 3.8 Ubuntu + TF 2.8
I get both metrics and model stored and uploaded
Any idea?
Hi @<1533620191232004096:profile|NuttyLobster9>base_task_factory
is a function that gets the node definition and returns a Task to be enqueued ,
pseudo code looks like:
def my_node_task_factory(node: PipelineController.Node) -> Task:
task = Task.create(...)
return task
Make sense ?
Finally managed; you keep saying "all projects" but you meant the "All Experiments" project instead. That's a good startÂ
 Thanks!
Yes, my apologies you are correct: "all experiments"
do I still need to specify a OutputModel
No need, only if you want to upload a local model file (but I assume in this case, no new model is created)
is there GPU support
That's basically depends on your template yaml resources, you can have multiple of those each one "connected" with a diff glue pulling from a diff queue. This way the user can enqueue a Task in a specific queue, say single_gpu
, then the glue listens on that queue and for each clearml Task it creates a k8s job the single gpu as specified in the pod template yaml.
hmm that is odd, let me check
Hi GentleSwallow91
I am very much concerned with docker container spin up time.
To accelerate spin up time (mostly pip install) use the venv cahing (basically it will store a cache of the entire installed venv so it oes not need to reinstall it)
Unmark this line:
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L116
The problem above could be that I used a non-root user to train a model and all packages are installed for ...
Essentially, I think the key thing here is we want to be able to build the entire Pipeline including any updates to existing pipeline steps and the addition of new steps without having to hard-code any Task ID’s and to be able to get the pipeline’s Task ID back at the end.
Oh if this is he case then basically you CI/CD code will be something like:
@PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_...
but I was wondering if there's any limitation in creating an image with a non root user to use as the actual worker?
SarcasticSquirrel56 non-root pods (containers) are fully supported,
I would recommend using the latest agent RC (that simplified a few things)clearml-agent==1.4.0rc3
I see... because the problem it would be with permissions when creating artifacts to store in the "/shared" folder
You mean as output target for artifacts ?
especially for datasets (for th...
ElegantCoyote26 can you browse to http://localhost:8080 on the machine that was running the trains-init ?
Hi @<1523701079223570432:profile|ReassuredOwl55> let me try ti add some color here:
Basically we have to parts (1) pipeline logic, i.e. the code that drives the DAG, (2) pipeline components, e.g. model verification
The pipeline logic (1) i.e. the code that creates the dag, the tasks and enqueues them, will be running in the git actions context. i.e. this is the automation code. The pipeline components themselves (2) e.g. model verification training etc. are running using the clearml agents...
right now I can't figure out how to get the session in order to get the notebook path
you mean the code that fires "HTTPConnectionPool" ?
(I mean new logs, while we are here did it report any progress)
This one seem to work
` from clearml import Task
task = Task.init(...)
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('_mpl-gallery')
make data:
np.random.seed(10)
D = np.random.normal((3, 5, 4), (0.75, 1.00, 0.75), (200, 3))
plot:
fig, ax = plt.subplots()
vp = ax.violinplot(D, [2, 4, 6], widths=2,
showmeans=False, showmedians=False, showextrema=False)
styling:
for body in vp['bodies']:
body.set_alpha(0.9)
ax.set(xlim=(0, 8), xticks=np.arang...
So net-net does this mean it’s behaving as expected,
It is as expected.
If no "Installed Packages" are listed, then it cannot pull a cached venv (because requirements.txt is not a full env, and it never analyzed it)).
It does however create a venv cache based on it (after installing it)
The Clone of this Task (i.e. right click on the UI clone experiment, enqueue it, Will use the cached copy becuase the full packages are listed in the "Installed Packages" section of the Task.
Make sens...
do you know how can i save all the logs and all the metric images?
These are stored into clearml-server, no? what am I missing ?
Try to add here:
None
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/"
FYI:ssh -R 8080:localhost:8080 -R 8008:localhost:8008 -R 8081:localhost:8081 replace_with_username@ubuntu_ip_here
solved the issue 🙂
Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.