Reputation
Badges 1
25 × Eureka!Hi JitteryCoyote63
If you want to stop the Task, click Abort (Reset will not stop the task or restart it, it will just clear the outputs and let you edit the Task itself) I think we witnessed something like that due to DataLoaders multiprocessing issues, and I think the solution was to add 'multiprocessing_context='forkserver' to the DataLoaderhttps://github.com/allegroai/clearml/issues/207#issuecomment-702422291
Could you verify?
AttributeError: 'NoneType' object has no attribute 'base_url'
can you print the model
object ?
(I think the error is a bit cryptic, but generally it might be the model is missing an actual URL link?)print(model.id, model.name, model.url)
Basically what I want is aΒ
clearml-session
Β but with a docker container running JupyterHub instead of JupyterLab.
I missed that π
The idea of clearml-session
is to launch a container with jupyterlab (or vscode) on a remote machine, and connect the users machines (i.e. the machine executed the clearml-session
CLI) directly into the container.
Pleacing the jupyterlab with JupyterHub will be meaningless here, becuase the idea it spins an instance (contai...
GrievingTurkey78 I'm not sure I follow, are you asking how to add additional scalars ?
Sure thing, any specific reason for querying on multi pod per GPU?
Is this for remote development process ?
BTW: the funny thing is, on bare metal machines multi GPU woks out of he box, and deploying it with bare metal clearml-agents is very simple
Hi AbruptWorm50
the second "epoch loss" is the scalar for the "validation" process (see "validation: epoch loss" series is actually the TF file/folder prefix automatically added)
Make sense ?
EcstaticGoat95 any chance you have an idea on how to reproduce? (even 1 out of 6 is a good start)
ShaggyHare67 in the HPO the learning should be (based on the above):General/training_config/optimizer_params/Adam/learning_rate
Notice the "General" prefix (notice it is case sensitive)
ShaggyHare67 I'm just making sure I understand the setup:
First "manual" run of the base experiment. It creates an experiment in the system, you see all the hyper parameters under General section. trains-agent
running on a machine HPO example is executed with the above HP as optimization paamateres HPO creates clones of the original experiment, with different configurations (verified in the UI) trains-agent executes said experiments, aand they are not completed.But it seems the paramete...
Hi JuicyFox94 ,
Actually we just added that π (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
I see in the UI are 5 drafts
What's the status of these 5 experiments? draft ?
Hi GiganticTurtle0
I have found thatΒ
clearml
Β does not automatically detect the imports specified within the function decorated
The pipeline decorator will automatically detect the imports Inside the funciton, but not outside (i.e. global), to allow better control of packages (think for example one step needs the huge torch package, and the other does not.
Make sense ?
How can I tellΒ
clearml
Β I will use the same virtual environment in all steps...
If you set the package_manager to peotry then it will only use the lock files
https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L53
If you clear the "Installed Packages" section, it will just use the "requirements.txt" in the repository itself.
What's the specific use case, and the problem we are trying to solve?
How can I ensure that additional tasks arenβt created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
So this is optuna π the idea is it will test which parameters have potential (with early stopping), then launch a subset of the selected parameters
did you run trains-agent
?
So obviously that is the problem
Correct.
ShaggyHare67 how come the "installed packages" are now empty ?
They should be automatically filled when executing locally?!
Any chance someone mistakenly deleted them?
Regrading the python environment, trains-agent
is creating a new clean venv for every experiment, if you need you can set in your trains.conf
:agent.package_manager.system_site_packages: true
https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c67...
Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.
Correct, I think "vault" option is only available on the paid tier π
but how should we do this for the credentials?
I'm not sure how to pass them, wouldn't it make sense to give the agent an all accessing credentials ?
ShaggyHare67 could you send the console log trains-agent
outputs when you run it?
Now theΒ
trains-agent
Β is running my code but it is unable to importΒ
trains
Do you have the package "trains" listed under "installed packages" in your experiment?
I aborted the task because of a bug on my side
π
Following this one, is treating abort as failed a must feature for the pipeline (in your case) or is it sort of a bug in your opinion ?
Hi TrickyRaccoon92
TKinter
Β is suddenly used as backend, and instead of writes to the dashboard I get popups per figure.
Are you running with an agent of manually executing the code ?
DeliciousBluewhale87 this is exactly how it works,
The glue puts a k8s job with the requested docker image (the one on the Task), the job itself (k8s job) starts the agent inside the requested docker, then the agent inside the docker will install all the required packages.
WackyRabbit7 you can configure AWS autoscaler with two types of instances , with priority to one of them. So in theory you do not need two autoscaler processes, with that in mind I "think" single IAM should suffice
Do you have python 3.7 in the docker ?
SmugOx94 could you please open a GitHub issue with this request, otherwise we might forget π
We might also get some feedback from other users
RattySeagull0 I think you are correct, python 3.6 is the installed inside the docker. Is it important to have 3.7 ? You might need another docker (or change the installation script and install python 3.7 inside)