. I was just wondering if instead of using local subprocesses, several agents could serve the same purpose (running several pipelines concurrently)
wouldn't --service-mode
(read as multiple simultaneous Tasks on the same agent) solve the issue?
(BTW: if you set the pipeline component target queue to "services" , this is exactly what will happen)
Hi GiganticTurtle0
ClearML will only list the directly imported packaged (not their requirements), meaning in your case it will only list "tf_funcs" (which you imported).
But I do not think there is a package named "tf_funcs" right ?
, it's just a custom module.
Is this your own module ? Is this a local folder we import from ?
Hi! I was wondering why ClearML recognize Scikit-learn scalers as Input Models...
Hi GiganticTurtle0
any joblib.load/save is logged by clearml (it cannot actually differentiate what it is used for ...)
You can of course disable it with Task.init(..., auto_connect_frameworks={'joblib': False})
The new parameter
abort_on_failed_steps
could be a list containing the name of the
I like that, we can also have it as an argument per step (i.e. the decorator can say, abort_pipeline_on_fail or continue_pipeline_processing)
GiganticTurtle0 is it in the same repository ?
If it is it should have detected the fact that it needs to analyze the entire repository (not just the standalone script, and then discover tensorflow)
Correct.
It starts with the initial script (entry point), if it is self contained (i.e. does not interact with the rest of the repo) it will only analyze it, otherwise it will analyze the entire repo code.
So was definitely related to the symlinks in some form
could it be it actually deleted the cache? How many agents are running on the same machine ?
Is there a way to filter a experiments in a hyperparameter sweep based on a given range of a parameter/metric in the UI
Are you referring to the HPO example? or the Task comparison ?
Hi UnsightlySeagull42
Basically you can get the agent to always add additional arguments for the docker run, such as -v for mounting:
https://github.com/allegroai/clearml-agent/blob/948fc4c6ce1ecf33a74619ad570d69b8188f6db9/docs/clearml.conf#L133
NastyFox63 ask SuccessfulKoala55 tomorrow, I think there is a way to change the default settings even with the current version.
(I.e. increase the default 100 entries limit)
others from the local environment and this causes a conflict when importing the attr module
Inside the docker ? " local environment" ?
This is all under "root" no?
Hi JitteryCoyote63
So the main issue is backing up the elastic & mongo DB while they are running, once they are backed/restored, the server will spin as is. (Let me check regrading the reddis, it might be that since it is used for caching there is no need to actually backup the content only the configuration)
SmallBluewhale13 in your code what are you getting when you print the version:from clearml import __version__ print(__version__)
Ohh so the setup.py is the one containing these requirements, oops I totally missed that :( let me check what pep has to say about that ... (Basically this is not a clearml issue but a pip one...)
Hi FancyWhale93 , in your clear.conf configure default output uri, you can specify the file server as default, or any object storage:
https://github.com/allegroai/clearml-agent/blob/9054ea37c2ef9152f8eca18ee4173893784c5f95/docs/clearml.conf#L409
JitteryCoyote63 How can I reproduce it quickly?
Any other port that could be open? (if SSH is already open we cannot launch another daemon on the same port)
ConvolutedChicken69
basically the cleamrl-data needs to store an immutable copy of the delta changes per version, if the files are already uploaded, there is a good chance they could be modified...
So in order to make sure we you have a clean immutable copy, it will always upload the data (notice it also packages everything into a single zip file, so it is easy to manage).
I guess I would need to put this in the extra_vm_bash_script param of the auto-scaler, but it will reboot in loop right? Isn’t there an easier way to achieve that?
You can edit the extra_vm_bash_script
which means the next time the instance is booted you will have the bash script executed,
In the meantime, you can ssh to the running instance and change the ulimit manually, wdyt?
Yes that should work, only thing is you need to call Task init on the master process (and make sure you call Task.current_task() on the subprocesses, if you want to automagic to kick in, that said, usually there is no need, they are supposed to report everything back to the main one anyhow
basically
` @call_parse
def main(
gpus:Param("The GPUs to use for distributed training", str)='all',
script:Param("Script to run", str, opt=False)='',
args:Param("Args to pass to script", nargs=...
MelancholyBeetle72 I think we collect them in Issue 81 on GitHub, feel free to add it if it is missing 🙂
https://github.com/allegroai/clearml/issues/81
Can you also share the full log? the numbers seem of (and clearml cannot actually "invent" those numbers they are coming from somewhere...)
@<1569858449813016576:profile|JumpyRaven4> fyi clearml-serving was synced 🤞
For example:examples/k8s_glue_example.py --queue k8s_gpu - --namespace pod-clearml-conf ~/trains.conf --template-yaml example/base.yml
Actually it hasn't changed ...
@<1545216070686609408:profile|EnthusiasticCow4>
Is there currently a way to bind the same GPU to multiple queues? I believe the agent complains last time I tried (which was a bit ago)
run multiple agents on the same GPU,
CLEARML_WORKER_NAME=host-gpu0a clearml-agent daemon --gpus 0
CLEARML_WORKER_NAME=host-gpu0b clearml-agent daemon --gpus 0
Hi UnevenHorse85
As far as I understand, users use logins and passwords specified in config/apiserver.conf to access webserver UI and key/secret key from their local ~/clearml.conf to access apiserver.
Correct 🙂
access apiserver. What is the use of all other security keys
To be able to configure the SDK client (i.e. clearml package) from OS environment and not clearml.conf file