Hi ReassuredTiger98
I do not want to create extra queues for this since this will not be able to properly distribute tasks.
Queues are the way to abstract different resources to "compute capabilities". It creates a simple interface to users on the one hand and allows you to control the compute on the other Agents can listen to multiple queues with priority. This means an RTX agent can pull from an RTX queue, and if this is empty, it will pull from "default" queueWould that work for ...
Hmm I think the approach in general would be to create two pipeline tasks, then launch them from a third pipeline or trigger externally? If on the other hand it makes sense to see both pipelines on the same execution graph, then the nested components makes a lot of sense. Wdyt?
AbruptWorm50 my apologies I think I mislead you you, yes you can pass geenric arguments to the optimizer class, but specifically for optuna, this is disabled (not sure why)
Specifically to your case, the way it works is:
your code logs to tensorboard, clearml catches the data and moves it to the Task (on clearml-server), optuna optimization is running on another machine, trail valies are maanually updated (i.e. the clearml optimization pulls the Task reported metric from the server and updat...
Hi @<1523701868901961728:profile|ReassuredTiger98>
Could you send the full log ? Also what's the clearml-agent
version?
Hi ShakyJellyfish91
It seems clearml is using a single connection, that takes a long time download
Hmm, I found this one:
https://github.com/allegroai/clearml/blob/1cb5dbb276026644ae20fef63d58256cdc887818/clearml/storage/helper.py#L1763
Does max_connections=10
mean 10 concurrent connections ?
Using the dataset.create command and the subsequent add_files, and upload commands I can see the upload action as an experiment but the data is not seen in the Datasets webpage.
ScantCrab97 it might be that you need the latest clearml
package installed on the client end (as well as the new server with the UI)
What is your clearml package version ?
Hmm, you are correct
Which means this is some conda issue, basically when installing from env file, conda is not resolving the correct pytorch version π
Not sure why... Could you try to upgrade conda ?
Hi CheerfulGorilla72
I guess this is a documentation bug, is there a stable link for the latest docker-compose ?
Where can I find information about that? I'd love to join!
This awesome , we have a few things in mind that we would love to improve. Do you have a lot of experience working with Trains? If you do, what would be most appealing for you ?
Well I guess you can say this is definitely not self explanatory line π
but, it is actually asking whether we should extract the code, think of it as:if extract_archive and cached_file: return cls._extract_to_cache(cached_file, name)
Where again does clearml place the venv?
Usually ~/.clearml/venvs-builds/<python version>/
Multiple agents will be venvs-builds.1
and so on
How does this work in the context of a pipeline?
Is your pipeline from functions / decorators ? or is it from Tasks ?
(if this is Tasks then just changing the entry point in the overides)
In case of functions or decorators, you have to do that manually (i.e. your function needs to do "accelerate launch"
from accelerate.commands.launch import launch_command, launch_command_parser
parser = launch_command_parser()
args = parser.parse_args("-command -here".split())
launch_command(arg...
well.. having the demo server by default lowers the effort threshold for trying ClearML and getting convinced it can deliver what it promises, and maybe test some simple custom use cases. I
This was exactly what we thought when we set it up in the first place π
(I can't imagine the cost is an issue, probably maintenance/upgrades ...)
There is still support for the demo server, you just need to set the env key:CLEARML_NO_DEFAULT_SERVER=0 python ...
os.environ['TRAINS_PROC_MASTER_ID'] = args.trains_id
it should be '1:'+args.trains_id
os.environ['TRAINS_PROC_MASTER_ID'] = '1:{}'.format(args.trains_id)
Also str(randint(1, sys.maxsize))
default is clearml data server
Yes the default is the clearml files server, what did you configure it to ? (e.g. should be something like None )
RC you can see on the main readme, (for some reason the Conda badge will show RC and the PyPi won't)
https://github.com/allegroai/clearml/
I think the only way is using the API, with task.query_tasks and filter, would that have helped?
I'll make sure we have conda ignore git:// packages, and pass them to the second pip stage.
Hi @<1661542579272945664:profile|SaltySpider22>
question 1: are parallel writes to a dataset with the same version possible?
When you are saying parallel what do you mean? from multiple machines ?
Whats the recommended way to append the dataset in a future version?
Once a dataset was finalized the only way to add files is to add another version that inherits from the previous one (i.e. the finalized version becomes the parent of the new version)
If you are worried about multip...
I didn't realise that pickling is what triggers clearml to pick it up.
No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)
ConvolutedSealion94 try scikit
not scikitlearn
I think we should add a warning if a key is there and is being ignored... let me make sure of that
I just disabled all of them with
auto_connect_frameworks=False
Yep that also works
UI for some anomalous file,
Notice the metrics are not files/artifacts, just scalars/plots/console
Hmm I think this is not doable ... π
(the underlying data is stored in DBs and changing it is not really possible without messing about with the DB)
yes they do π
Hmm Okay, I think the takeaway is that we should print "missing notebook
package" π
For example, store inference results, explanations, etc and then use them in a different process. I currently use separate database for this.
You can use artifacts for complex data then retrieve them programatically.
Or you can manually report scalers / plots etc, with Logger
class, also you can retrive them with task.get_last_scalar_metrics
I see that you guys have made a lot of progress in the last two months! I'm excited to dig inΒ
Thank you!
You can further di...
pass :task_filter=dict(system_tags=['-archived'])
I'm sorry wrong line reference:
I'm assuming the error is due to ulimit missing:
try adding 16777216 to both soft/hard ulimit
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L58
Hmm ElegantKangaroo44 low memory that might explain the behavior
BTW: 1==stop request, 3=Task Aborted/Failed
Which makes sense if it crashed on low memory...