Reputation
Badges 1
25 × Eureka!trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101.
Could you send a log, it should have worked π
Hi GrievingTurkey78
I think it is already fixed with 0.17.5, no?
TenseOstrich47 FYI:
This might what you are looking for π
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L61
Debug samples can only be controlled via api.file_server (or programatically)
Model/Artifacts see above
This has no effect. I am not able to change the files_sever, e.g. I can not change from
You are Not changing the files_server just where your Taskj uploads Models/Artifacts, these are two diff things (and again Only applies to Artifacts/Models)
Or you can do:
param={'key': 123}
task.connect(param)
I couldn't change the task status from draft to complete
Task.completed(ignore_errors=True)
@<1523701523954012160:profile|ShallowCormorant89> can you verify it is reproducible in 1.9.3 ? because if it is I'd like to fix that π
will it be possible for us to configure the "new run" button in a way so that it always clones from a particular pipeline ?
What do you mean by "particular pipeline" ? by default it will clone the last successful one, and by right clicking a specific one you can run a copy of that one. what am I missing ?
. but when we try to do a "New Run" from UI, it tries to follow the DAG of previous run (the run with all child nodes skipped) and the new run fails too.
This is odd, is this reproducible ? what's the clearml python package version ?
agree, but setting the agentβs env variable TMPDIR
I think this needs to be passed to the docker with -e TMPDIR=/new/tmp as additional container args:
see example
None
wdyt?
yes
argument saying always create from code
can be helpful
@<1523701523954012160:profile|ShallowCormorant89> any chance you can open a github issue on that, just so we do not forget ?
if we can edit the configuration objects of a pipeline, that can be beneficial too. which we're unable to do from UI
Actually you already can, after you clone the pipeline, you can press on details then go to configuration Tab, and edit the pipeline object. The format is HOCON (...
I see, so in theory you could call add_step with a pipeline parameter (i.e. pipe.add_parameter etc.)
But currently the implementation is such that if you are starting the pipeline from the UI
(i.e. rerunning it with a different argument), the pipeline DAG is deserialized from the Pipeline Task (the idea that one could control the entire DAG externally without changing the code)
I think a good idea would be to actually allow the pipeline class to have an argument saying always create from cod...
Hi @<1523701523954012160:profile|ShallowCormorant89>
This is generally based on number of agents, or am I missing something ? Also is it based on Task or decorated functions ?
Hi @<1523701523954012160:profile|ShallowCormorant89>
This means the system did not detect any "iteration" reporting (think scalars) and it needs a time-series axis for the monitoring, so it just uses seconds from start
instead of terminating them once they are inactive, so that they could be available immediately when they are needed.
JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?
MistakenBee55 how about a Task doing the Model quantization, then trigger it with TriggerScheduler ?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
Hi UnsightlySeagull42
Basically you can get the agent to always add additional arguments for the docker run, such as -v for mounting:
https://github.com/allegroai/clearml-agent/blob/948fc4c6ce1ecf33a74619ad570d69b8188f6db9/docs/clearml.conf#L133
BTW from the log you attached:
File "/root/.clearml/venvs-builds/3.6/lib/python3.6/site-packages/clearml/storage/helper.py", line 218, in StorageHelper
_gs_configurations = GSBucketConfigurations.from_config(config.get('google.storage', {}))
This means it tries to remove an artifact from a Task, that artifact is probably in GS (i'm assuming because it is using the GS api), and the cleanup service is missing the GS configuraiton.
WackyRabbit7 is that possible ?
Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent
is there a code that can reproduce it ?
Hi, I was expecting to see the container rather then the actual physical machine.
It is the container, it should tunnels directly into it. (or that's how it should be).
SSH port 10022
Hmm if this is case, you can add some prints in here:
None
the service/action will tell you what you are sending
wdyt?
Hi @<1578555761724755968:profile|GrievingKoala83>
mount s3 as a cache folder
I'm not sure that would be fast enough for cache ...
How to override
/root/.cache/pip
path?
in your clearml.conf fille:
None
then set it to your PV
LazyTurkey38 notice the assumption is that the docker entry-point ends with bash, and only then the agent take charge. I'm assuming this is not te case hence the agent spins the docker, then the docker just ends, could that be?
Thanks CynicalBee90 I appreciate the discussion! since I'm assuming you will actually amend the misrepresentation in your table, let me followup here.
1.
SPSS license may be a significant consideration for some, and so we thought it was important to point this out clearly.
SPSS is fully open-source compliant unless you have the intention of selling it as a service, I hardly think this is any users consideration, just like anyone would be using mongodb or elastic search without think...
Intersting!
I would also add that Task name is not unique and you can use to describe the "process / goal etc" which would make it pretty obvious to search / review from the UI.
Regrading models and branchs, Iw ould use the Task tags (you can have as many as you like) to tag the specific model type (or dev branch if the alg is diff), this means you can also easily filter based on the Tags in the UI.
can you use the Web UI to compare the artifacts from two separate subprojects?
Yes comp...
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
I think there is force argument, to force download.
I think the main issue is getting the size from different backends (i.e. s3 /https / etc.)
Maybe we should add it as a GitHub feature request issue?
The main limitation is that the driver "list()" does not return file size.
For example it might be an issue with the default http files-server.
wdyt?
Hi QuaintPelican38
Assuming you have open the default SSH port 10022 on the ec2 instance (and assuming the AWS premissions are set so that you can access it). You need to use the --public-ip flag when running the clearml-session. Otherwise it "thinks" it is running on a local network and it registers itself with the local IP. With the flag on it gets the public IP of the machine, then the clearml-session running on your machine can connect to it.
Make sense ?
Queues can have multiple workers, and that implies multiple instances of a task can run concurrently.
@<1533619716533260288:profile|SmallPigeon24> as long as these are the Exact same instances you can have them runing simultaneously (think multi node training), that said each one should "know" not to report over the others, because of course it will overwrite the reports.
Back to your point on multiple agents:
You cannot have two Tasks in the same queue, that means that a single agen...