WittyOwl57 what about? vm.max_map_count
echo "vm.max_map_count=262144" > /tmp/99-clearml.conf
sudo mv /tmp/99-clearml.conf /etc/sysctl.d/99-clearml.conf
sudo sysctl -w vm.max_map_count=262144
sudo service docker restart `https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_linux_mac (5)
Seems like credentials error
Do you have everything setup correctly in your ~/clearml.conf
?
Hi WorriedParrot51 , what do you mean by "call get_parameters_as_dict() from agent" ?
Do you mean like change the trains-agent to run the task differently?
Or inside your code while the trains agent runs it?
From the code itself (regardless off how you run it) you can always call, and get the current states parameters (i.e. from backend if running with trains-agent, or copied from the code, if running manually)task.get_parameters_as_dict()
Yeah @<1689446563463565312:profile|SmallTurkey79> is right, reverting to image is the safest way to get exactly the same...
btw, @<1791277437087125504:profile|BrightDog7> if you can produce a standalone example of reporting the data, we can probably fix whatever is broken in the auto convert, or at least revert to image based automatically (basically if the plot is simple enough it will try to convert it, otherwise it will automatically revert to image internally)
Hi @<1590514584836378624:profile|AmiableSeaturtle81>
I think you should use add_external_files
, instead of add_files
(which is for local files)
None
SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?
Hmm so the Task.init should be called on the main process, this way the subprocess knows the Task is already created (you can call Task.init twice to get the task object). I wonder if we somehow can communicate between the sub processes without initializing in the main one...
Right! I just noticed that! this is odd... and yes defiantly has something to do with the multi pipeline executed on the agent, I think I know what to look for ...
(just making sure (again), running_locally produced exactly what we were expecting, is that correct?)
. Iβm using the default operation mode which uses kubectl run. Should I use templates and specify a service in there to be able to connect to the pods?
Ohh the default "kubectl run" does not support the "ports-mode" π
Thereβs a static number of pod which services are created forβ¦
You got it! π
i.e. runpip install --upgrade trains
PompousBeetle71 a few questions:
is this like using PyTorch distributed , only manually? Why don't you use call trains.init
in all the sub processes? We had a few threads on that, it seems like a recurring question, I'll make sure we have an example on GitHub. Basically trains will take care of passing the arg-parser commands to the sub processes, and also on torch node settings. It will also make sure they all report to the tame experiment.What do you think?
The main reason to add the timeout is because the warning was annoying to users π
The secondary was that clearml will start reporting based on seconds from start, then when iterations start it will revert back to iterations. But if the iterations are "epochs" the numbers are lower so you end up with a graph that does not match the expected "iterations" x-axis. Make sense ?
WorriedParrot51 trains should support subparsers etc.
Even if your code calls the parsing before trains.
The only thing you need is to import the package when argparser is called (not to initialize it, that can happen later)
It should (hopefully) solve the issue.
AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_DEFAULT_REGION
Hi DisgustedDove53
Now for the clearml-session tasks, a port-forward should be done each time if I need to access the Jupyter notebook UI for example.
So basically this is why the k8s glue has --ports-mode.
Essentially you setup a k8s service (doing the ingest TCP ports) then the template.yaml that is used by the k8s glue should specify said service. Then the clearml-session knows how to access the actual pod, by a the parameters the k8s glue sets on the Task.
Make sense ?
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf
?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
I guess we should have obfuscated the name better π
Hi LazyFox65
So the idea is that you add two lines of code to your codebase :from clearml import Task task = Task.init(project_name='examples', task_name='change me')
And you run it once, then it will create the experiment, environment arguments etc.
Now that you have it in the UI you can clone / change all the fields and send for execution.
That said you can also create an experiment from CLI (basically pointing to a repo and entry point)
You can read here:
https://github.com/allegroa...
BTW: I think we had a better example, I'll try to look for one
When are those keys used?
They are the default keys for internal access, basically just make up something, otherwise someoune could access the server with the default keys
BattyLion34 I have a theory, I think that any Task on the "default" queue qill fail if a Task is running on the "service" queue.
Could you create a toy Task that just print "." and sleeps for 5 seconds and then prints again.
Then while that Task is running, from the UI launch the Task that passed on the "default" queue. If my theory holds it should fail, then we will be getting somewhere π
. Looking at this example here, it looks like it only works with tasks:
Aha! Pipeline is a Task π (a specific type of Task, nonetheless a Task)
Just use the pipeline ID, and make sure you push it into the services queue, voila
No worries π glad to hear it worked out π
a bit sad that there is no working integration with one of the leading time series framework...
You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?
Hi UnevenHorse85
As far as I understand, users use logins and passwords specified in config/apiserver.conf to access webserver UI and key/secret key from their local ~/clearml.conf to access apiserver.
Correct π
access apiserver. What is the use of all other security keys
To be able to configure the SDK client (i.e. clearml package) from OS environment and not clearml.conf file
CrookedWalrus33
Force SSH git authentication, it will auto mount the .ssh from the host to the docker
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25
If you set the TMP env variable you can control the tmp folder. Would that work?
Hi VexedCat68
Are we talking youtubes ? docs? courses ?