Are these experiments logged too (with the train-valid curves, etc)?
Yes every run is log as a new experiment (with it's own set of HP). Do notice that the execution itself is done by the "trains-agent". Meaning the HP process creates experiments with new set of HP an dputs them into the execution queue, then trains-agent
pulls them from the queue and starts executing them. You can have multiple trains-agent
on as many machines as you like with specific GPUs etc. each one ...
MysteriousBee56 Okay, let's try this one:docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo done"
I mean to reduce the API calls without reducing the scalars that are logged, e.g. by sending less frequent batched updates.
Understood,
In my current trials I am using up the API calls very quickly though.
Why would that happen?
The logging is already batched (meaning 1API for a bunch of stuff)
Could it be lots of console lines?
BTW you can set the flush period to 30 sec, which would automatically collectt and batch API calls
https://github.com/allegroai/clearml/blob/25df5efe7...
JitteryCoyote63 okay... but let me explain a bit so you get a better intuition for next time 🙂
The Task.init call, when running remotely, assumes the Task object already exists in the backend, so it ignores whatever was in the code and uses the data stored on the trains-server, similar to what's happening with Task.connect and the argparser.
This gives you the option of adding/changing the "output_uri" for any Task regardless of the code. In the Execution tab, change the "Output Destina...
Hi IrritableOwl63
Yes this seems like a docker setup issue 🙂
either run the agent with sudo (not really recommended 😉 ) or add to suduers :
https://docs.docker.com/engine/install/linux-postinstall/
PompousParrot44 obviously you can just archive a task and run the cleanup service, it will actually delete archived tasks older than X days.
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
That is correct.
Obviously once it is in the system, you can just clone/edit/enqueue it.
Running it once is a mean to populate the trains-server.
Make sense ?
I couldn't change the task status from draft to complete
Task.completed(ignore_errors=True)
Try to manually edit the "Installed Packages" (right click the Task, select "reset", now you can edit the section)
and change it to :-e git+ssh@github.com:user/private_package.git@57f382f51d124299788544b3e7afa11c4cba2d1f#egg=private_package
(assuming " pip install -e
mailto:git+ssh@github.com :user/...
" will work, should solve the issue )
(Just a thought, maybe we just need to combine Kedro-Viz ?)
DefeatedCrab47 no idea, but you are more then welcome to join the thread here, and point it out:
https://github.com/PyTorchLightning/pytorch-lightning-bolts/issues/249
I mean the caching will work, but it will reinstall this repository on top of the cached copy.
make sense ?
it's in the docker image, doesn't the git clone command run in the container
Then this should have worked.
Did you pass in the configuration: force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L25
SarcasticSparrow10 sure see "execute_remotely" it does exactly that:
https://allegro.ai/docs/task.html#trains.task.Task.execute_remotely
It will stop the current process (after syncing everything) and launch itself remotely (i.e. enqueue itself)
When the same code is running by the "trains-agent" the execute_remotely call becomes a no-operation and is basically skipped
TrickySheep9 Yes, let's do that!
How do you PR a change ?
JuicyFox94
NICE!!! this is exactly what I had in mind.
BTW: you do not need to put the default values there, basically it reads the defaults from the package itself trains-agent/trains and uses the conf file as overrides, so this section can only contain the parts that are important (like cache location credentials etc)
BattyLion34
Maybe something inside the task is different?!
Could you run these lines and send me the result:from clearml import Task print(Task.get_task(task_id='failing task id').export_task()) print(Task.get_task(task_id='working task id').export_task())
Could it be it was never allocated to begin with ?
I'll try to create a more classic image.
That is always better, though I remember we have some flag to allow that, you can try with:CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 clearml-agent ...
Thanks ElegantCoyote26 I'll look into it. Seems like someone liked our automagical approach 🙂
TenseOstrich47 you can actually enter this script as part of the extra_docker_shell_script
This will be executed at the beginning of each Task inside the container, and as long as the execution time is under 12h, you should be fine. wdyt?
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
Hi MiniatureCrocodile39
I would personally recommend the ClearML show 😉
https://www.youtube.com/watch?v=XpXLMKhnV5k
https://www.youtube.com/watch?v=qz9x7fTQZZ8
Hi RipeGoose2
So the http://app.community.clear.ml already contains it.
Next release of the standalone server (a.k.a clearml-server) will include it as well.
I think the ETA is end of the year (i.e. 2 weeks), but I'm not sure on the exact timeframe.
Sounds good ?
Yea I know, I reported this
LOL, apologies these days it a miracle I still remember my login passwords 😉