at means I need to pass a single zip file to
path
argument in
add_files
, right?
actually the opposite, you pass a folder (of files) to add_files. Then add_files remembers the files location (and pre calculates the hash of the files content). When you call upload
it will actually compress the files that changed into a zip file (or files depending on the chunk size), and upload the files to the destination (as specified in the upload
call...
docstring ?
Usually the preferred way is StorageManager
https://clear.ml/docs/latest/docs/references/sdk/storage
https://clear.ml/docs/latest/docs/integrations/storage
ColossalDeer61 btw, it turns out the docker-compose services docker was ill configured on the GitHub 😞 I suggest you get the latest copy of it:curl
-o docker-compose.yml
Any idea where that could come from? Could we turn off the local logging as well - in these kinds of runs we don’t need it?
It is supposed to create it automatically... I tested with other examples (clearml version 1.7.3rc1) everything seems to work
What am I missing? how do we recreate the issue ? can you verify it is still not working with the latest RC?
SteadyFox10 With pleasure 🙂
BTW: you can retrieve the Task id from its name withTask.get_tasks(project_name='my project', task_name='my task name')
See https://allegro.ai/docs/task.html?highlight=get_tasks#trains.task.Task.get_tasks
Oh think I understand you point now.
basically you can:
Create the initial Task, once it is in the system clone it and adjust parameters externally. A simple example here:
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/automation/manual_random_param_search_example.py#L41
wdyt?
The easiest if export_task / update_task:
https://allegro.ai/docs/task.html#trains.task.Task.export_task
https://allegro.ai/docs/task.html#trains.task.Task.update_task
Check the structure returned by export_task, you'll find the entire configuration test there,
then, you can use that to update back the Task.
BTW:
Partial update is also supported...
EnviousStarfish54 data versioning on the open source leverages the artifacts and storage and caching capabilities of Trains.
A simple workflow
- Upload data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/dataset_artifact.py - Preprocessing data
https://github.com/allegroai/events/blob/master/odsc20-east/generic/process_dataset.py - Using data
https://github.com/allegroai/events/blob/master/odsc20-east/scikit-learn/sklearn_jupyter.ipynb
About .get_local_copy... would that then work in the agent though?
Yes it would work both locally (i.e. without agent) and remotely
Because I understand that there might not be a local copy in the Agent?
If the file does not exist locally it will be downloaded and cached for you
or point to the self signed certificate:export REQUESTS_CA_BUNDLE=/path/to/your/certificate.pem
SubstantialElk6 I just realized 3 weeks passed, wow!
So the good news we have some new examples:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_functions.py
The bad news the documentation was postponed a bit, as we are still messaging the interface (the community is constantly pushing for great ideas and uses cases , and they are just too good to miss out 🙂 )...
Ohh I see.
In your web app, look for the "?" icon (bottom left corner), click on it, it should open the full platform documentation
PompousParrot44 unfortunately not yet 😞
But the gist is :
MongoDB stores experiment data (i.e. execution parameters, git ref etc.)
ElasticSearch stores results (i.e. metrics console logs, debug image links etc.)
Does that help?
Could you maybe send a screenshot? This is very strange? Also what's the trains version?
generally speaking the agent will convert the repo url to the auth scheme it is configured with, ssh->hhtp if using user/pass, and http->ssh if using ssh
Hmm that should have worked ...
I'm assuming the Task itself is running on a remote agent, correct ?
Can you see the changes in the OmegaConf section ?
what happens when you pass--args overrides="['dataset.path=abcd']"
Error 101 : Inconsistent data encountered in document: document=Output, field=model
Okay this point to a migration issue from 0.17 to 1.0
First try to upgrade to 1.0 then to 1.0.2
(I would also upgrade a single apiserver instance, once it is done, then you can spin the rest)
Make sense ?
Hi HealthyStarfish45
You can disable the entire TB logging :Task.init('examples', 'train', auto_connect_frameworks={'tensorflow': False})
ReassuredTiger98
How can I make clearml-agent use pre-installed version from the nvidia/pytorch
If the Same version is required, the agent will not try to reinstall it (the new venv the agent is creating inside the container, inherits from the preinstalled system packages)
Comes with PyTorch Version 1.12 based on a commit
. I tried
torch >= 1.11
,
torch == 1.12
If in your installed packages you have torch==1.12
the agent should not tr...
EnviousStarfish54
oh, this is a bit different from my expectation. I thought I can use artifact for dataset or model version control.
You totally can use artifacts as a way to version data (actually we will have it built in in the next versions)
Getting an artifact programmatically:
Task.get_task(task_id='aabb'). artifacts['artifactname'].get()
Models are logged automatically. No need to log manually
So if you are using the latest clearml (i.e. +1.3) reenqueuing the pipline will automatically continue it from where it stopped.
With previous versions (which is your case, I think), you clone the pipeline Task, change the parameter and enqueue it.
(The state itself of the pipeline is stored on the Task, and when you clone it, you are cloning the state as well).
Make sense ?
DeliciousBluewhale87 You can havwe multiple queues for the k8s queuea in priory order:python k8s_glue_example.py --queue glue_q_high glue_q_low
Then if someone is doing 100 experiments (say HPO), then they push into the "glie_q_low" which means it will first pop Tasks from the high priority queue and if it is empty it will pop from the low priority queue.
Does that make sense ?
Can you let me know if i can override the docker image using template.yaml?
No, you cannot.
But you can pass OS environment "CLEARML_DOCKER_IMAGE" to set a diff default one
Are you running the agent in docker mode? or venv mode ?
Can you manually ssh on port 10022 to the remote agent's machine ?ssh -p 10022 root@agent_ip_here
hmmm, somehow I have a bed feeling about it... Could you check the log, it should say something like "Collecting torch==1.6.0.dev20200421+cu101 from https://"
It should be right at the top of the installation. What do you have there?
We are always looking for additional talented people 😉 DM me...
replace it with:git+
No need for the repository name, this will ensure you always reinstall it (again pip feature)