Reputation
Badges 1
25 × Eureka!(I mean new logs, while we are here did it report any progress)
So the "packages" are the packages you need in the steps themselves ?
Hi IrritableJellyfish76
https://clear.ml/docs/latest/docs/references/sdk/task#taskget_tasks
task_name
(
str
) – The full name or partial name of the Tasks to match within the specified
project_name
(or all projects if
project_name
is
None
). This method supports regular expressions for name matching. (Optional)
You are right, this is a bit confusing, I will make sure that we add in the docstring an examp...
Hi GreasyPenguin14
This is what I did, but I could not reproduce the hang, how is this different from your code?
` from multiprocessing import Process
import numpy as np
from matplotlib import pyplot as plt
from clearml import Task, StorageManager
class MyProcess(Process):
def run(self):
# in another process
global logger
# Create a plot
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = ...
Hi SubstantialElk6
Generally speaking here, the idea is that actual code creates a Dataset (i.e. Dataset class created from code), plus you can add some metric reporting (like table reporting) to create a preview of the data stored for better visibility, or maybe create some statistics as part of the data ingest script. Then this ingest code can be relaunched / automated. The created Dataset itself can be tagged renamed added key/value for better cataloging. wdyt?
Can you verify it fixes the timeout issue as well? (or some insight on how to reproduce the issue?)
Sure thing! this feature is all you guys, ask and shall receive 🙂
LazyLeopard18 could you explain some more on the specific use case you have in mind?
Hmm is this similar to this one https://allegroai-trains.slack.com/archives/CTK20V944/p1597845996171600?thread_ts=1597845996.171600&cid=CTK20V944
Hi WickedGoat98
This sounds like a great design (obviously you have scale in mind 😉 ) Feel free to ask "stupid" questions, based on what you already wrote I doubt they will be
A few questions that come to mind (probably a few others after):
You mentioned FS synchronization, from where? i.e. what is the single source of truth ? K8s (Rancher 2.0 is basically k8s manager) can take care of mounting volumes, so no need to sync, is this a valid solution ?
BTW : (you can drag and drop an i...
A few epochs is just fine
Hmm you will have to set the trains-server on a machine somewhere, it can be any machine win / Mac / Linux
FlutteringWorm14 any insight on the Task the it fails to delete ? or to reproduce ?
Sounds great! I really like that approach, thanks GrotesqueDog77 !
It may have been killed or evicted or something after a day or 2.
Actually the ideal setup is to have a "services" pod running all these service on a single pod, with clearml-agent --services-mode. This Pod should always be on and pull jobs from a dedicated queue.
Maybe a nice way to do that is to have the single Task serialize itself, then have the a Pod run the Task every X hours and spin it down
So I would like to to know what it send to the server to create the task/pipeline, ...
WittyOwl57
To get task Id's use (e.g. all the tasks of a specific project):task_ids = Task.query_tasks(project_name="examples", task_filter={'status': ["completed"])
Then per task:
` for t_id in tasks_id:
t = Task.get_task(t_id)
conf_dict = t.get_configuration_as_dict(name="filter")
task_param = t.get_parameters()
task_param['filter'] = conf_dict
# this is to enable to forcefully update parameters post execution
t.mark_started(force=True)
# update hyper-parame...
I think it was just pushed, including nested call you have to use the new argument for the decorator, helper_function
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L2392
` from time import sleep
from clearml import Task
import tqdm
task = Task.init(project_name='debug', task_name='test tqdm cr cl')
print('start')
for i in tqdm.tqdm(range(100)):
sleep(1)
print('done') `The above example code will output a line every 10 seconds (with the default console_cr_flush_period=10) , can you verify it works for you?
Sure this is basically REST query 🙂
` from clearml.backend_api.session.client import APIClient
client = APIClient()
models = client.models.get_all(name='regexp', tags=['demo'], project=['project_id'])
print(models) `
Hi JitteryCoyote63 , I have to admit, we have not thought of this scenario... what's the exact use case to clone a Task and change the type?
Obviously you can always change the task type, a bit of a hack but should work:task._edit(type='testing')
Hmm ElegantKangaroo44 low memory that might explain the behavior
BTW: 1==stop request, 3=Task Aborted/Failed
Which makes sense if it crashed on low memory...
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Yes exactly! it should be very easy
Just Inherit from RandomSearch and change create_job
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/clearml/automation/optimization.py#L1043
UnevenDolphin73 since at the end plotly is doing the presentation, I think you can provide the extra layout here:
https://github.com/allegroai/clearml/blob/226a6826216a9cabaf9c7877dcfe645c6ae801d1/clearml/logger.py#L293
If a Task is in the 'Completed' I think the only option is to 'Reset' it (see image).
In the UI yes, in code you can do task.mark_aborted(force=True)
You do clear the previous run execution but I think for a repetitive task this is fine.
I would avoid that, no?