Or use python:3.9 when starting the agent
This is probably the best solution 🙂
And it works correctly when running on my computer, and if I use colab, then for some reason it has no effect.
I think I'm lost on this one, when running in colab, is this continuing a previous experiment ?
BTW: GreasyPenguin14 you can also upload them as debug samples (when setting the output_uri, the debug samples will be uploaded to the same destination)
https://github.com/allegroai/clearml/blob/6b9297660e0ed83a77bce3da2fab384c552206fd/examples/reporting/image_reporting.py#L21
Because we are working with very big files, having them stored at multiple locations is something we try to avoid
Just so I better understand, is this for storing files as part of a dataset, or as debug samples ?
In other words can two diff processes create the exact same file (image) ?
trains-agent build --docker nvidia/cuda --id myTaskId --target base_env_services
It's building a gpu enabled docker...
you might want a diff container or to specific --cpu-only
A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...
(without having to execute it first on Machine C)
Someone some where has to create the definition of the environment...
The easiest to go about it is to execute it one.
You can add to your code the following linetask.execute_remotely(queue_name='default')
This will cause you code to stop running and enqueue itself on a specific queue.
Quite useful if you want to make sure everything works, (like run a single step) then continue on another machine.
Notice that switching between cpu...
And having a pdf is easier/better than sharing a link to the results page ?
and of course:task.set_parameters_as_dict(params)
Hi DrabCockroach54
This seems like a pip issue trying to install from source, try upgrading the pip version and before installing numpy, it should solve it 🤞
CheerfulGorilla72
upd: I see NAN in the tensorboard, and 0 in Clearml.
I have to admit, since NaN's are actually skipped in the graph, should we actually log them ?
Right, if this is the case, then just use 'title/name 001'
it should be enough (I think this is how TB separates title/series or metric/variant )
MysteriousBee56 yes, please change the trains code!!! Wee pee, if you think someone else can benefit, feel free to PR :)
Regrading the double entry, that seems like an odd bug, how can I reproduce it?
If this is the case:dataset = Dataset.get(...) dataset.get_dependency_graph()
https://clear.ml/docs/latest/docs/references/sdk/dataset#get_dependency_graph
BroadMole98
I'm still exploring what trains is for.
I guess you can think of Trains as Experiment manager + MLOps tied together.
The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.
In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also...
I was wondering about what i can do with the agent's argparse magic
You mean how to pass arguments to components a pipeline? btw did you check the pipeline example here?
None
Hi @<1529633468214939648:profile|CostlyElephant1>
what seems to be the issue? I could not locate anything in the log
"Environment setup completed successfully
Starting Task Execution:"
Do you mean it takes a long time to setup the environment inside the container?
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL and CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL,
It seems to be working, as you can see no virtual environment is created, the only thing that is installed is the cleartml-agent that i...
The pipeline stores the state of it's previous run, specifically the executed steps.
In our case the executed step was reset (I assume) so it cannot find the output model you are referring to, hence crashing
CleanPigeon16 make sense ?
Basically run the 'agentin virtual environment mode JumpyDragonfly13 try this one (notice no --docker flag)
clearml-agent daemon --queue interactive --create-queue Then from the "laptop" try to get a remote session with:
clearml-session `
GrievingTurkey78 I'm not sure I follow, are you asking how to add additional scalars ?
GrievingTurkey78 short answer no 😞
Long answer, the files are stored as differentiable sets (think changes set from the previous version(s)) The collection of files is then compressed and stored as a single zip. The zip itself can be stored on Google but on their object storage (not the GDrive). Notice that the default storage for the clearml-data is the clearml-server, that said you can always mix and match (even between versions).
Can you share the storagemanager usage, and error you are getting ?
RobustGoldfish9
I think you need to set the trains-agent docker to be aware of the host, so it knows how to mount data/cache/configurations into the sibling docker
It should look something like:TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains"
So if running a docker:docker run -e TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains" ...
Any chance there is an env variable you set to get 1.5.0rc0? Because this is the version that is being used