Reputation
Badges 1
25 × Eureka!GrotesqueDog77 this should just work, decorate the functions with @PipelineDecorator.component
and call the functions one after the otherpaths = step_one() step_two(paths)
ClearML will make sure it serializes the strings and pass them to step two (of course step two should actually run on a machine with access to the same folder, but this is another issue 🙂 )
Ohh, clearml is designed so that you should not worry about that, download_dataset = StorageManger.get_local_copy()
this is cashed, meaning the machine that runs that like the second time will not re download the path.
This means step 1 is redundant, no?
Usually when data is passed between components it is automatically uploaded as artifact to the Task (stored on the files server or object storage etc.) then downloaded and passed to the next steps.
How large is the data that you are wo...
HandsomeCrow5client.events.debug_images(metrics=[dict(task='6adb929f66d14731bc76e3493ab89d80', metric='image')])
metric=image is the name in the dropdown of the denugimages
Hi DilapidatedDucks58 ,
Just making sure all 8 works have different worker ids? (you can see 8 in the workers page in the UI)
Also, are they running this docker or venv mode?
, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
A bit of how clearml-agent works (and actually on how clearml itself works).
When running manually (i.e. not executed by an agent), Task.init (and similarly task.connect etc.) will log data on the Task itself (i.e. will send arguments /parameters to the server), This includes logint the argparser for example (and any other part of the automagic or manuall connect).
When run...
Hi UpsetCrocodile10
First, I perform many experiments in one process, ...
How about this one:
https://github.com/allegroai/trains/issues/230#issuecomment-723503146
Basically you could utilize create_function_task
This means you have Task.init() on the mainn "controller" and each "train_in_subset" as a "function_task". Them the controller can wait on them, and collect the data (like the HPO does.
Basically:
` controller_task = Task.init(...)
children = []
for i, s in enumer...
Questions
I want to trigger a retrain task when F1
That means that in inference you are reporting the F1 score, correct?
As part of the retraining I have to train all the models and then have to choose best one and deploy it
Are you using passing output_uri to Task.init? are you storing the model as artifact?
You can tag your model/task with "best" tag (and untag the previous one). Then in production , look for the "best" task and get its model
Thoughts?
Something like the TYPE_STRING that Triton accepts.
I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[…]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py
yes that makes send, I think what happened is one of the processes completed the Task (i.e. closed it) before the others did, and so they threw exception.
I switched to have all tasks in a separate process
I think that's probably the best (performance wise as well), nice!
Hi DeliciousBluewhale87
I think you are correct, there is no way to pass it.
As TimelyPenguin76 mentioned you can either set a default output_uri on the agent's config file, or edit the created Task in the UI.
What is the specific use case ? Maybe we should add this ability, wdyt?
🤔 maybe we should have "sub nodes" as just visual functions running inside the same actual pipeline component ?
It will always set it's own environment, wither with static analysis or with "pip freeze" / "conda freeze"
It needs to log the exact setup that was actually installed.
When you later launch it on a remote machine, it can either use this to recreate the environment (using pip or conda), or you can clear the entire section, where it will fall back to "requirements.txt"
Any reason for specifically using the "environment.yaml" ?
python version to be used and conda will install it
clearml does that automatically (albeit it is not shown in the UI, which should be fixed)
Basically the links to the file server are saved in both mongo and elastic, so as long as these are host:ip based, at least in theory it should work
Hi CheerfulGorilla72
I guess this is a documentation bug, is there a stable link for the latest docker-compose ?
Yes, there is no real limit, I think the only requirements id docker v19+
the only thing that missing is some plots on the clearml server (app ) when i got to the details of the train i cannot see the matrix confusion for example ( but its exists on the bucket )
How do you report the "matrix confusion" ? (I might have an idea on what's the difference)
Are you saying that in the UI you do not see "confusion matrix" at all, only on the GS bucket ?
I think you are correct, it seems like it is missing requirements to boto/azure/google (I will make sure this is added). In the meantime, you can stop the "triton serving engine" Task, reset it, add boto3 to the installed packages and relaunch.
That said your main issue might be packaging the python model. Basically you need to create a model from the entire folder (with whatever there is inside the folder), then Triton should be able to run it (if the config.pbtxt is correct).
` m = OutputMo...
Hmm I guess doable 🙂 could you open a github issue with feature request ?
If we have enough support it will bump it in the priority 🤞
ohh, could it be a 32bit version of python ?
Ohh, then yes, you can use the https://github.com/allegroai/clearml/blob/bd110aed5e902efbc03fd4f0e576e40c860e0fb2/clearml/automation/monitor.py#L10 class to monitor changes in the dataset/project
Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)
Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?
Any insight will help, if you can provide the log of the Task that did get stuck, that would be a good start