Now in case I needed to do it, can I add new parameters to cloned experiment or will these get deleted?
Adding new parameters is supported ๐
make sure you follow all the steps :
https://clear.ml/docs/latest/docs/deploying_clearml/upgrade_server_linux_mac
(basically make sure you get the latest docker-compose.yml and the pull itcurl
-o /opt/clearml/docker-compose.yml docker-compose -f /opt/clearml/docker-compose.yml pull docker-compose -f /opt/clearml/docker-compose.yml up -d
Essentially, I think the key thing here is we want to be able to build the entire Pipeline including any updates to existing pipeline steps and the addition of new steps without having to hard-code any Task IDโs and to be able to get the pipelineโs Task ID back at the end.
Oh if this is he case then basically you CI/CD code will be something like:
@PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_...
I mean clone the Task in the UI (right click Clone), then go to the execution Tab, to the "installed packages" section, then click on Edit -> go to the torchvision http link, and replace it with torchvision == 0.7.0
and save.
Then right enqueue the Task (to the default queue) and see if the Agent can run it,
DeterminedToad86 Make sense ?
BTW:
Task.add_requirements('tensorflow', '2.2') will make sure you get the specified version ๐
CooperativeFox72 btw, are you guys running those 20 experiments manually or through trains-agent ?
As a result, I need to do somethig which copies the files (e.g. cp -r or StorageManager.upload_folder(โbโ, โaโ)
but this is expensive
You are saying the copy is just wasteful (but you do have the files locally)?
Whats the trains server IP? It seems everything is configured with local host?
Ohh SubstantialElk6 please use agent RC3, (latest RC is somewhat broken sorry, we will pull it out)
With default settings, to upload 2 datasets of 120 GB and 70 Gb it took more than 6 hours!
SmugSnake6 at the end s the an outcome of limited bandwidth or limited CPU ?
SmarmyDolphin68 okay what's happening is the process exists before the actual data is being sent (report_matplotlib_figure is an async call, and data is sent in the background)
Basically you should just wait for all the events to be flushedtask.flush(wait_for_uploads=True)
That said, quickly testing it it seems it does not wait properly (again I think this is due to the fact we do not have a main Task here, I'll continue debugging)
In the meantime you can just dosleep(3.0)
And it wil...
Hi @<1523704157695905792:profile|VivaciousBadger56>
You should replace
task.mark_completed()
with:
task.close()
To your point
parameters = task.connect(parameters)
Will be retrieved with:
task.get_parameters()
fyi:
connect_configuration -> get_configuration_objects
Hi WittyOwl57
I'm guessing clearml is trying to unify the histograms for each iteration, but the result is in this case not useful.
I think you are correct, the TB histograms are actually a 3d histograms (i.e. 2d histograms over time, which would be the default for kernel;/bias etc.)
is there a way to ungroup the result by iteration, and, is it possible to group it by something else (e.g. the tags of the two plots displayed below side by side).
Can you provide a toy example...
Hmm I would recommend passing it as an artifact, or returning it's value from the decorated pipeline function. Wdyt?
GiganticTurtle0 so this was already supposed to be out (v1.1) but a minor py2 backwards compatibility delayed it. Anyhow you can now just call pipeline.start(..)
https://github.com/allegroai/clearml/blob/889d2373988a0d6630703cc1c865e09e58f8f981/examples/pipeline/pipeline_from_tasks.py#L47
(to run it locally call start_locally(...) )pip install git+
(the new version will be out in a few days, meanwhile you can test the new pipeline interface directly from git)
Hi SkinnyPanda43
In your local machine do not pass output_uri at all, so nothing will be uploaded.
On the agent's configuration file configure, default_output_uri
to the S3 bucket
(Notice you can always override them in the UI, see the bottom of the execution Tab)
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L312
If this is the case why not have the stream process call the rest api, then move forward with the result? This way it scales out of the box, the main "conceptual" difference is that the restapi is used internally, and the upside is the event streaming processing becomes part of the application layer, not tied with the compute cost of the model , wdyt?
Based on the log you have shared:OSError: [Errno 28] No space left on device
I would increase the storage ?
https://github.community/t/github-actions-failing-with-errno-28-no-space-left-on-device/18164/10
https://stackoverflow.com/questions/70175977/multiprocessing-no-space-left-on-device
https://groups.google.com/g/ansible-project/c/4U6MyvyvthQ
I would start by increasing the size of the TMPDIR folder
PompousBeetle71 so basically exclude parameters that are considered "local" only, so that other people will not accidentally use them?
Are you referring to Poetry ?
Hi AttractiveCockroach17
. Many of these experiments appear with status running on clearml even though they have finish running,
Could it be their process just terminated? (i.e. not properly shutdown) ?
How are you running these multiple experiments?
BTW: if the server does not see any change in a Task for (I think the default is 2 hours) it will automatically mark these Task as aborted
FierceHamster54 are you sure you have write permissions ?
If this is the case then the easiest is:from clearml.backend_api.session.client import APIClient client = APIClient() res = client.events.get_task_plots(task="<task-id>")
We should defiantly have a nice interface ๐
I'm guessing the extra index URL can be a URL to the github repo of interest?
The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?
os.environ['TRAINS_PROC_MASTER_ID'] = args.trains_id
it should be '1:'+args.trains_id
os.environ['TRAINS_PROC_MASTER_ID'] = '1:{}'.format(args.trains_id)
Also str(randint(1, sys.maxsize))
Hi @<1651395720067944448:profile|GiddyHedgehong81>
However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:
Just add the entire folder with your files to a dataset, then get it in your code
Add files (you can do that from CLI for example): None
clearml-data add --files my_folder_with_files
Then from code: [Non...
Ohh RotundHedgehog76 this implies a single jupyter hub with multiple uses, is that correct ?
(if this is the case, then yes, clearml-session is definitely not the correct solution, I would look for a helm chart for jupyter hub)