Reputation
Badges 1
25 × Eureka!This is an example of hoe one can clone an experiment and change it from code:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
A full HPO optimization process (basically the same idea only with optimization algorithms deciding on the next set of parameters) is also available:
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Obviously if you click on them you will be able to compare based on specific metric / parameters (either as table or in parallel coordinates)
HealthyStarfish45 the pycharm plugin is mainly for remote debugging, you can of course use it for local debugging but the value is just to be able to configure your user credentials and trains-server.
In remote debbugging, it will make sure the correct git repo/diff are stored alongside the experiment (this is due to the fact that pycharm will no sync the .git folder to the remote machine, so without the plugin Trains will not know the git repo etc.)
Is that helpful ?
WackyRabbit7 hmmm seems like non regular character inside the diff.
Let me check something
would I have to execute each task in the pipeline locally(but still connected to trains),
Somehow you have to have the pipeline step Task in the system, you can import it from code, or you can run it once, then the pipeline will clone it and reuse it. Am I missing something ?
that is because my own machine has 10.2 (not the docker, the machine the agent is on)
No that has nothing to do with it, the CUDA is inside the container. I'm referring to this image https://allegroai-trains.slack.com/archives/CTK20V944/p1593440299094400?thread_ts=1593437149.089400&cid=CTK20V944
Assuming this is the output from your code running inside the docker , it points to cuda version 10.2
Am I missing something ?
BroadMole98 Awesome, can't wait for your findings π
This is odd because the screen grab point to CUDA 10.2 ...
We might need to change the default base docker image, but I remember it was there... Let me check again
replace the base-docker-image and it should work fine π
well cudnn is actually missing from the base image...
PompousParrot44 I think the website should address that:
https://allegro.ai/
But the TD;DR is the enterprise version adds Full Dataset Versioning on top, with end-to-end integration from code to DLOps (e.g.. data sampling , database query capabilities, data visualization, multi-site support, permission etc,)
Hi SubstantialBaldeagle49
yes, you can backup the entire trains-server (see the github docs on how) You mean upgrading the server? Yes, you can change the name or add comments (Info tab / description ), and you can add key/value description (under the configuration tab, see user properties)
PompousBeetle71 could you check that the "output:destination" is the same for both experiments ?
Hi SteadyFox10 the way it works is that Trains limits the debug image history by reusing the same files names, so the UI will only present the iterations where the debug images are relevant for. With your sample code it looks like it exposes a bug , the generated link should contain iteration number, it does not and so it overwrites the debug images every iteration. Here is the image link: https://demofiles.trains.allegro.ai/Test/test_images.6ed32a2b5a094f2da47e6967bba1ebd0/metrics/Test/te...
MelancholyBeetle72 thanks! I'll see if we could release an RC with a fix soon, for you to test :)
GiddyTurkey39
as others will also be running the same scripts from their own local development machine
Which would mean trains
` will update the installed packages, no?
his is why I was inquiring about theΒ
requirements.txt
Β file,
My apologies, of course this is supported π
If you have no "installed packages" (i.e. the field is empty in the UI) the trains-agent
will revert to installing the requirements.txt
from the git repo itself, then it...
TrickyRaccoon92 Thanks you so much! π
The azure section:
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L117
Hmmm could you attach the entire log?
Remove any info that you feel is too sensitive :)
Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda
Hi LazyTurkey38
Configuring these folders will be pushed later today π
Basically you'll have in your clearml.conf
` agent {
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
apt_cache: "/var/cache/apt/archives"
ssh_folder: "/root/.ssh"
pip_cache: "/root/.cache/pip"
poetry_cache: "/root/.cache/pypoetry"
vcs_cache: "/root/.clearml/vcs-cache"
venv_build: "/root/.clearml/venvs-builds"
pip_download: "/root/.clearml/p...
when you are running the n+1 epoch you get the 2*n+1 reported
RipeGoose2 like twice the gap, i.e internally it adds the an offset of the last iteration... is this easily reproducible ?
okay let's PR this fix ?
Hi UnsightlyShark53 apologies for this delayed reply, slack doesn't alert users unless you add @ , so things sometimes get lost :(
I think you pointed at the correct culprit...
Did you manage to overcome the circular include?
BTW , how could I reproduce it? It will be nice if we could solve it
You can always access the entire experiment data from python
'Task.get_task(Id).data'
It should all be there.
What's the exact use case you had in mind?
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
This is an official Ubuntu container (nothing to do with ClearML), this is Very Very odd...
WickedGoat98
Put the agent.docker_preprocess_bash_script
in the root of the file (i.e. you can just add the entire thing at the top of the trains.conf)
Might it be possible that I can place a trains.conf in the mapped local folder containing the filesystem and mongodb data etc e.g.
I'm assuming you are referring to the trains-=agent services, if this is the case, sure you can,
Edit your docker-compose.yml, under line https://github.com/allegroai/trains-server/blob/b93591ec3226...