Are you running a jupyter notebook inside vscode ?
Yey! BTW: what the setup you are running it with ? does it include "manual" tasks? Do you also report on completed experiments (not just failed ones)? Do you filter by iteration numbers?
Okay, I was able to reproduce, this will only happen if you are running from a daemon process (like in the case of a process pool), Python is sometimes very picky when it comes to multi-threading/processes I'll check what we can do 🙂
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
That's the only explanation ...
But the weird thing is, it did not work on my linux box?!
Sounds good let's work on it after the weekend, 🙂
All the 3 steps can be found here:
https://github.com/allegroai/trains/tree/master/examples/pipeline
Sadly, I think we need to add another option like task_init_kwargs
to the component decorator.
what do you think would make sense ?
PleasantGiraffe85
it took the repo from the cache. When I delete the cache, it can't get the repo any longer.
what error are you getting ? (are we talking about the internal repo)
GrotesqueDog77 one issue with this design, in order to run a sub-component, the call must be done from the parent component, does that make sense?
` def step_one(data):
return data
def step_two(path):
return model
def both_steps()
path = step_one("stuff")
return step_two(path)
def pipeline():
both_steps() Which would make
both_steps ` a component and step_one and step_two sub-components
wdyt?
EnviousPanda91 the host checks if you have a .ssh folder on the machine, if you do, it will copy+mount it into the container, then it will delete the copy when the container is down.
Specifically /tmp/clearml_agent.ssh.rbw8o0t7
is the copy of the .ssh that the agent created, and now it is mounting it into the container
EnviousPanda91
in your clearml.conf I think you are missing a sectionagent.git_user="" agent.git_pass="" agent.git_host="" agent.force_git_ssh_protocol: true
but why is it mounted only once?
Are you saying the second time this line is missing? this is very strange...
Can you send the full Task log?
Instead you can do: TRAINS_WORKER_NAME = "trains-agent":$DYNAMIC_INSTANCE_ID
Then the Worker ID will running instance appended to the worker name. This means that even if you use the same $DYNAMIC_INSTANCE_ID twice, you will not have two agent registering on the same name.
Why is it using an OutputModel and an InputModel?
So calling OutputModel will create the new Model entity and upload the data, InputModel will store it as required input Model.
Basically on the Task you have input & output section, when you clone the Task you are copying the input section into the newly created Task, and the assumption is that when you execute it, your code will create the output section.
Here when you clone the Task you will be clone the reference to the InputModel (i...
is it possible to change an existing model's URL?
Edit the DBs ... That's basically the only way 😞
HandsomeCrow5 OMG the guys already added it to the debug samples as well, checkout the demo app (drop down "test html sample"):
https://demoapp.trains.allegro.ai/projects/4e7fef090aa849b1acc37d92b59b3360/experiments/83c9ed509f0e421eaadc1ef56b3af5b4/info-output/debugImages
Kind of as it tries to do "apt-get install"...
what did you have in mind ?
HandsomeCrow5 if you want to edit the Task object you can just use:internal_task_representation = task.data internal_task_representation.execution.script = ... task._edit(execution=internal_task_representation.execution)
This will make sure you do not need to worry about API version etc. the Task object will take care of it.
BTW: it seems a few more people wanted this ability, maybe we should edit a proper .edit method to Task. Thoughts ?
It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true
it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?
so what should the value of "upload_uri" to set to,Â
fileserver_url
 e.g.Â
 ?
yes, that would work.
BTW: you can quite easily add an option to set the offline folder, check here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/config/init.py#L31
PRs are always appreciated :)
It should also work with host IP and two docker compose files.
I'm not sure where to push a for a unified docker compose?
EnormousWorm79 you mean to get the DAG graph of the Dataset (like you see in the plots section)?
If possible, can we have a "only one experiment can be given a single tag"
You mean "moving a tag" automatically (i.e. if someone else had the same tag it is removed from it)?
SubstantialElk6 if you call Task.init with continue_last_task=<task_id> it will automatically add the last_iteration of the previous run, to any logging/report so you never overwrite the previous reports 🙂
Thanks GorgeousMole24
That is a very good point! passing to product guys
his means that you guys internally catch the argparser object somehow right?
Correct 🙂 this is how you get the type checking casting abilities, and a few other perks
Hi TrickyRaccoon92
... would any running experiment keep a cache of to-be-sent-data, fail the experiment, or continue the run, skipping the recordings until the server is back up?
Basically they will keep trying to send data to server until it is up again (you should not loose any of the logs)
Are there any clever functionality for dumping experiment data to external storage to avoid filling up the server?
You mean artifacts or the database ?