Yep the automagic only kick in with Task.init... The main difference and the advantage of using a Dataset object is the underlying Task resides in a specific structure that is used when searching based on project/name/version, but other than that, it should just work
(once you verify PR the fix, I'll make sure it is merged)
Hi @<1631102016807768064:profile|ZanySealion18>
I'm using SSH for authentication, however, known_hosts doesn't seem to be passed to the docker so it prompts for authentification/fingerprint. Any ideas?
Hmm it is supposed to automatically mount your ~/.ssh folder into the docker to solve for that.
First try to set force_git_ssh_protocol: true
None
If that does not he...
instead of the one that I want or the one of the env which it is started from.
The default is the python that is used to run the agent.agent.ignore_requested_python_version = true agent.python_binary = /my/selected/python3.8
Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?
Ooops 😞
task.get_tags()
task.set_tags()
I can't seem to figure out what the names should be from the pytorch example - where did INPUT__0 come from
This is actually the latyer name in the model:
https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666ee85fd126fcb49e2e/examples/pytorch/train_pytorch_mnist.py#L24
Which is just the default name Pytorch gives the layer
https://discuss.pytorch.org/t/how-to-get-layer-names-in-a-network/134238
it appears I need to converted into TorchScript?
Yes, this ...
I was using clearml == 0.17.5 and I also had this issue
I think it was introduced when we moved to subprocess reporting, with 0.17.5
You can disable it with the following in clearml.conf:sdk.development.report_use_subprocess = false
For setting trains-server I would recommend the docker-compose, it is very easy to setup, and you just need a single fixed compute instance, details https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md With regards to the "low prio clusters", are you asking how they could be connected with the trains-agent or if running code that uses trains will work on them?
Hi SkinnyPanda43
Are you trying to access the same Task or an external one ?
Ohh "~/trains.conf" is root probably
So does that mean "origin" solves the issue ?
great 🙂
two things:
I'm not sure argparse supports dict as a type (I mean it will take anything but I'm not sure it will parse your arguments as dict) I know there was an issue with argparsing, but I think it was solvedbtw: Basically the way clearml-agent works, it does not actually pass the arguments in commandline but directly to the argparser at runtime
What happens if you clone the Task (the one with Args showing and without the explicit task.connect(_args) and send it to the age...
Hi FancyWhale93 you can disable the auto model uploading with@PipelineDecorator.component(..., auto_connect_frameworks={'pytorch': False}) def step(): pass
clearml-task
 seems does not allow me passing theÂ
run
 argument without value
EnviousStarfish54 did you try --args run=True
I'm assuming run is a boolean of a sort ?
a. The submitted job would automatically download data from internal data repository, but it will be time consuming if data is re-downloaded every time. Does ClearML caching the data somewhere?
What do you mean by the agent will download the data ? are you referring to Dataset ?
MysteriousBee56 I would do Task.create()
you can get the full Task internal representation with task.data
Then call task._edit(script={'repo': ...}) to edit/update all the Task entries.
You can check the dull details of the task object here: https://github.com/allegroai/trains/blob/master/trains/backend_api/services/v2_8/tasks.py#L954
BTW: when you have a sample script working, consider PR-ing it, I'm sure it will be useful for others 🙂 (also a great way to get us involved with debuggin...
How does ClearML select reference branch? Could it be that ClearML only checks "origin" branch?
Yes 😞 I think we can quickly fix that, I'm just trying to realize if there are down sides to running "git ls-remote --get-url" without origin
Oh that makes sense, This depends on how you setup the clearml k8s glue, (becuase the resource allocation is done by k8s) a good hack to limit the number of containers per GPU is to set a RAM limitation per pod, then k8s will know to limit the number of pods on the same GPU machine,
wdty?
I think this is the only mount you need:
Data persisted in every Kubernetes volume by ClearML will be accessible in /tmp/clearml-kind folder on the host.
SuccessfulKoala55 is this correct ?
Notice the error code:Action failed <400/401: tasks.create/v1.0 (Invalid project id: id=first_attempt)>If that is the case, The project ID is incorrect (project id is not the project name)
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
will my datasets be stored on the same machine that hosts the clearml server?
By default yes, they will be stored to the files-server (but you can change it, this is an argument for both the CLI and the python interface)
so I guess this could be one reason to start about thinking upgrading ....
Wait you mean the clearml-server ? (there is no reason not to upgrade the python package)
GrotesqueDog77 this should just work, decorate the functions with @PipelineDecorator.component and call the functions one after the otherpaths = step_one() step_two(paths)ClearML will make sure it serializes the strings and pass them to step two (of course step two should actually run on a machine with access to the same folder, but this is another issue 🙂 )
By default SSH server is not running in a lot of scenarios (k8s for example, Windows, MacOS)...
Hi JitteryCoyote63
could you check if the problem exists in the latest RC?pip install clearml==1.0.4rc1
I am creating this user
Please explain, I think this is the culprit ...
CharmingBeetle38 try adding "General/" before the arguments. This means batch_size becomes General/batch_size. This is only because we are accessing the parameters externally, when the task is executed it is resolved automatically