Hi @<1534706830800850944:profile|ZealousCoyote89>
We'd like to have pipeline A trigger pipeline B
Basically a Pipeline is a Task (of a specific Type), so you can have pipeline A function clone/enqueue the pipelineB Task, and wait until it is done. wdyt?
Hi DeliciousBluewhale87
I think we had a docker that does exactly that, and then you would spin the docker as a k8s service , is this what you are referring to?
This would be my only improvement, otherwise awesome!!!output_model.update_weights(weights_filename=os.path.join(training_data_path, 'runs', 'train', 'yolov5s6_results', 'weights', 'best.onnx'))
How did you add the args? Is it argparser? If so the help is automatically picked so you can see it in yhe UI. BTW, the ability to provide a list of options is a really cool feature to have, I'll make sure to pass ot to product ๐
Wtf? can you try with = (notice single not double)?
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0
Hi GreasyPenguin14
Could you tell me what the differences are and why we should use ClearML data?
The first difference is in the approach itself, DVC ties the data with the code (i.e. git repo), where we (ClearML - but not just us) actually think data should be abstracted from the Code-Base and become a standalone argument, allowing users to build/execute against different dataset/versions. ClearML Data becomes part of the workflow as it is visible from the UI including the abili...
VictoriousPenguin97 I'm not sure there is an easy solution, basically you have to edit both MongoDB (artifacts) and Elastic (think debug samples) ๐
Hi CluelessElephant89
Hi guys, if I spot issue with documentations, where should I post them?
The best way from our perspective PR the fix ๐ this is why we put it on GitHub
But itโs running in docker mode and it is trying to ssh into the host machine and failing
It is Not sshing to the machine it is sshing directly Into the container.
Notice the port is is sshing to is 10022 which is mapped into the container
Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all
these are being repeated as well for a single task (this is training a t5_model with transformers):ย (edited)
Seems like someone is storing lots of files with torch.save that ClearML automatically logs.
You can disable the autolog:task = Task.init(..., auto_connect_frameworks={'pytorch': False})
Lol yeah Hydra is great. Notice you still have the ability to override Hydra from the UI so you really have the best of the two worlds
RobustGoldfish9
I think you need to set the trains-agent docker to be aware of the host, so it knows how to mount data/cache/configurations into the sibling docker
It should look something like:TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains"So if running a docker:docker run -e TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains" ...
Import Error sounds so out of place it should not be a problem :)
Hi TastyOwl44
So this depends on your code itself, but usually you need a CPU machine to run ClearML server (or use the free community server), than a machine to run the pipeline controller (usually the same machine running the clearml-server , as the pipeline control code is basically controller only and does not execute the Task itself), lastly you need machines with GPU running the clearml-agent (these GPU machines are the one actually doing the training inference etc.)
Make ...
Obviously if you click on them you will be able to compare based on specific metric / parameters (either as table or in parallel coordinates)
Is there an easy way to add a docker argument in the python script?
On the task it self in the UI you can edit the docker arguments and add any missing flags
(task.set_base_docker will do the same from code)
You can also edit the configuration and always add this flag:
None
I think there is a bug on the UI that causes series with "." to only use the first part of the series name for the color selection. This means "epsilon 0" and "epsilon 0.1" will always get the same color, and this will explain why it works on other graphs
JitteryCoyote63
Picks a new experiment on top of the long one running
This is very very strange. Is the long running experiment being logged (i.e. do you still see console output in the UI)?
If possible, can we have a "only one experiment can be given a single tag"
You mean "moving a tag" automatically (i.e. if someone else had the same tag it is removed from it)?
Will they get ordered ascending or descending?
Good point, I'll check the docs... but I think they do not specify
https://clear.ml/docs/latst/docs/references/sdk/task#taskget_tasks
From the code it seems the ordered is not guaranteed.
You can however pass '-last_update' : order_by which will give you the latest updated first
` task_filter = {
'page_size': 2,
'page': 0,
'order_by': ['last_metrics.{}.{}'.format(title, series), '-last_update']
}
Task.get_tasks(...
I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
hmm what do you mean by "compose after running experiments" ? like a way to group them? what is the relation between one "item" to another ?
If this is a sequence of Tasks , are they executed by a controller ?
`
Example use case:
an_optimizer = HyperParameterOptimizer(
# This is the experiment we want to optimize
base_task_id=args['template_task_id'],
# here we define the hyper-parameters to optimize
hyper_parameters=[
UniformIntegerParameterRange('General/layer_1', min_value=128, max_value=512, step_size=128),
UniformIntegerParameterRange('General/layer_2', min_value=128, max_value=512, step_size=128),
DiscreteParameterRange('General/batch_size', values=[...
Will be shortly released with news RC :)
My pleasure, and apologies ๐
BroadMole98
I'm still exploring what trains is for.
I guess you can think of Trains as Experiment manager + MLOps tied together.
The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.
In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also...
JitteryCoyote63
I agree that its name is not search-engine friendly,
LOL ๐
It was an internal joke the guys decided to call it "trains" cause you know it trains...
It was unstoppable, we should probably do a line of merch with AI ๐ ๐
Anyhow, this one definitely backfired...
Honestly, this is all related to issue #340.
makes total sense.
But actually this id different from #340. The feature is to store the Data on the Task, this means each Task in your "pipeline" will be upload a new copy of the data. No?
I'd suggest someย
task.detach()
ย method for remote execution maybe
That is a good idea, in theory it can also be used in local execution
Unfortunately not yet in venv mode. What would you have put there?