In the agent, no, it pipes stdout/stderr of the container and logs everything π
to get a json or something like that?
There is an api to get all the console logs, is this what you are after?
Hi SmugTurtle78
Unfortunately there is no actual filtering for these logs, because they are so important for debugging and visibility. I have to ask, what's the use case to remove some of them ?
Yes, but does add_external_files makes chunked zips as add_files do?
No it references them, (i.e. meta-data not actually doing something with the files themselves)
I need the zipping, chunking to manage millions of files
That makes sens, if that's the case you will have to download those files anyway, and then add them with add_files
you can use the StoargeManager to download them, and then add them from the local copy (this will zip/chunk them)
[None](https://clear.ml/docs/la...
Hi @<1686547344096497664:profile|ContemplativeArcticwolf43>
In the 2nd 'Getting Started' tutorial,
Could you send a link to the specific notebook?
. But whenever a task is picked, it fails for the following
You mean after the Task.init
call?
FYI:ssh -R 8080:localhost:8080 -R 8008:localhost:8008 -R 8081:localhost:8081 replace_with_username@ubuntu_ip_here
solved the issue π
Hi QuaintJellyfish58
You can always set it inside the function, withTask.current_task().output_uri = "s3://"
I have to ask, I would assume the agents are pre-configured with "default_output_uri" in the clearml.conf, why would you need to set it manually?
Hi @<1544853721739956224:profile|QuizzicalFox36>
Sure just change the ports on the docker compose
I'm going to follow your suggestion and just put the extra effort into distributing a pre-built image.
That sounds good π
If you feel the need is important, I do have a hack in mind, it will be doable once we have support for entrypoint "-c python_code_here". But since this is still not available I believe you are right and build an image would be the easiest.
A note on the docker image, remember that when running inside the docker we inherit the system packages installed on the d...
I see, would having this feature solve it (i.e. base docker + bash init script)?
https://github.com/allegroai/trains/issues/236
Hi UnsightlyBeetle11
Is it possible to report the model's architecture (PyTorch model) automatically on ClearML, as we do it via Netron or other neural network visualisation tools?You mean like the actual network layout? Unfortunately, there is currently no option to do that, you can however manually store a plot/image that represents it
BTW:I think that at the beginning Netron was somehow integrated, but it was rarely used and support for it was not trivial so it was phased out. You can ho...
Awesome, PRs are always welcome, and we try to help with any request and feature coming for users. We just added audio support (RC releasing in a few days) based only on users request.
https://github.com/allegroai/trains/issues/120
@<1541954607595393024:profile|BattyCrocodile47> not restarting the docker, restarting the Docker service (on Mac it's an app, I think there is an option on the Docker app to do that)
Hi UnevenDolphin73
Can one compare experiments/tasks from different projects?
Yes, the easiest way is to go to the parent project ("all projects" if they have no common parent, then search for the specific Tasks (i.e. filter or using the search bar), then multi-select them.
wdyt?
DrabOwl94 how many 1M files did you end up having ?
Okay, I think this might be a bit of an overkill, but I'll entertain the idea π
Try passing the user as key, and password as secret?
- Could we add a comparison feature directly from the search results (Dashboard view -> search -> highlight some experiments for comparison)?
Totally forgot about the global search feature, hmm I'm not sure the webapp is in the correct "state" for that, i.e. I think that the selection only works in "table view", which is the "all experiments" flat table
- Could we add a filter on the project name in the "All Experiments" project?
You mean "filter by project" ?
Could we ad...
Hi DrabOwl94
I think that if I understand you correctly you have a Lot of chunks (which translates to a lot of links to small 1MB files, because this is how you setup the chunk size). Now apparently you have reached the maximum number of chunks per specific Dataset version (at the end this meta-data is stored in a document with limited size, specifically 16MB).
How many chunks do you have there?
(In other words what's the size of the entire dataset in MBs)
JitteryCoyote63 I think this only holds for the conda distribution.
(Actually quite interesting, I wonder what happens if you already installed cudatoolkit...)
@<1523701099620470784:profile|ElegantCoyote26> what's the target upload? also how come you are uploading a local file and auto deleting it, and then uploading the same one as artifact ?
Hi @<1523703472304689152:profile|UpsetTurkey67>
I circumvented the problem by putting timestamp in task name, but I don't think this is necessary.
Just pass reuse_last_task_id=False
to Task.init, it will never try to reuse them π
None
Maybe the configuration file changed?
None
The logic is if the name and project are the same, and there are no artifacts/models, and the last time it was created was under 72 hours, reuse the Task
BeefyCow3 see this https://allegroai-trains.slack.com/archives/CTK20V944/p1593077204051100 :)
Hi @<1590514584836378624:profile|AmiableSeaturtle81>
I think you should use add_external_files
, instead of add_files
(which is for local files)
None
Is it possibe to launch a task from Machine C to the queue that Machine B's agent is listening to?
Yes, that's the idea
Do I have to have anything installed (aside from theΒ
trains
Β PIP package) on Machine C to do so?
Nothing, pure magic π
AstonishingWorm64
You can turn on the venv cache , it will just handle it's own full env caching π
See here:
https://github.com/allegroai/clearml-agent/blob/4f7407084d1900a79d455570c573e60f40208742/docs/clearml.conf#L100
A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...
(without having to execute it first on Machine C)
Someone some where has to create the definition of the environment...
The easiest to go about it is to execute it one.
You can add to your code the following linetask.execute_remotely(queue_name='default')
This will cause you code to stop running and enqueue itself on a specific queue.
Quite useful if you want to make sure everything works, (like run a single step) then continue on another machine.
Notice that switching between cpu...
Hi BoredSquirrel45
as of today, my required packages aren't being recognized in cloned
Are you saying you are editing the code directly in the cloned Task, then enqueue the Task an the agent does not "auto recognize" the package ?