Reputation
Badges 1
25 × Eureka!(I'll make sure it is added to the docstring because apparently it was not there
DefiantHippopotamus88 you are sending the curl to the wrong port , it should be 9090 (based on what remember from the unified docker compose) on your setup
@<1523707653782507520:profile|MelancholyElk85> I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?
Hmm I cannot think of something that will provide something a per user basis.
Wouldn't a global set of credentials that the agent is using be enough ?
(on the local machine, user can keep using the "definitions.py")
WackyRabbit7 just making sure I understand:MedianPredictionCollector.process_results Is called after the pipeline is completed.
Then inside the function, Task.current_task() returns None.
Is this correct?
I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)
I'm not sure how the helm is built but do we have a "services queue" on the helm?
CourageousKoala93 when you call Task.close() it will mark the task as completed, there is no need to do that manually. The idea with mark_completed is that you can forcefully change the state if needed, or externally stop the task and mark it completed. Make sense?
Hi UnevenOstrich23
if --docker is enable that will means every new experiments will be executed into dedicated agent worker containers?
Correct
I think the missing part is how to specify the docker for the experiment?
If this is the case, in the web UI, clone your experiment (which will create a draft copy, that you can edit), then in the Execution tab, scroll down to the "base docker image" and specify the docker image to use.
Notice that you can also add flags after the docker im...
No, clearml uses boto, this is internal boto error, which points bucket size limit, see the error itself
Okay fixed, you will be able to override it with output_uri=False (which is ignored on remote execution if you have a project default or Task output uri set in the UI).
Make sense ?
EcstaticGoat95 I can see the experiment but I cannot access the notebook (I get Binder inaccessible)
Is this the exact script as here? https://clearml.slack.com/archives/CTK20V944/p1636536308385700?thread_ts=1634910855.059900&cid=CTK20V944
BTW: generally speaking the default source dir inside a docker will be:/root/.trains/venvs-builds/<python_version>/task_repository/<repository_name>/
for example:/root/.trains/venvs-builds/3.6/task_repository/trains.git/
PompousBeetle71 , These are cuda versions, I'm looking for the nvidia driver version for example 440.xx or 418.xx .
The reason is, we set an OS environment for the driver, and I remember that old drivers did not support it . Basically they do not support NVIDIA_VISIBLE_DEVICES=all , so I'm trying to see if that's the case, then we could add fix .
Nice ! 🙂
btw: clone=True means creating a copy of the running Task, but basically there is no need for that , with clone=False, it will stop the running process, and launch it on the remote host, logging everything on the original Task.
Just dropping this here but I've had some funky compressions with very small datasets!
Odd deflate behavior ...?!
Hi SourOx12
I think that you do not actually need this one:step = step - cfg.start_epoch + 1you can just dostep += 1ClearML Will take care of the offset itself
Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...
StraightDog31 can you elaborate? where are the parameters stored? who is trying to access them, and maybe for what purpose ?
MysteriousBee56 there is no way to tell the trains-agent to pull from local copy of your repository...
You might be able to hack it, if you copy the entire local repo to the trains-agent version control cache. would that help you?
SuperiorPanda77 I have to admit, not sure what would cause the slowness only on GCP ... (if anything I would expect the network infrastructure would be faster)
That should spin up an instance, right? (it currently doesn't, and I'm not sure where to debug)
Do you see the AWS scaler Task running ?
(This is the code/process that actually spins a new EC2 instance)
Should be under Profile -> Workspace (Configuration Vault)
If there was an SSL issue it should log to console right?
correct, also the agent is able to report, so I'm assuming configuration is correct
@<1724960464275771392:profile|DepravedBee82> could you try to put the clearml import + Task .init at the top of your code?
GiganticTurtle0 , let me add some background. The idea is that at some point you had your code running on your machine (when developing it for example),
when you actually executed the code itself in development, you call 'task.init' (to track the development process for example). This Task.init call, did the analysis of the code and python package dependencies and stored in on the Task. Then when you clone the Task, it already lists all the python packages your code directly imports (see "In...
If the manual execution (i.e. pycharm) was working it should have stored it on the Pipeline Task.
task = Task.init(project_name='debug', task_name='test tqdm cr cl') print('start') for i in tqdm.tqdm(range(100), dynamic_ncols=True,): sleep(1) print('done')This code snippet works as expected (console will show the progress at the flush interval without values in between). What's the difference ?!
Thanks for checking NastyFox63
I double checked with both front/backend , there should not be any limit...
Could you maybe provide a toy demo to reproduce the issue ?