Hmm ElegantKangaroo44 low memory that might explain the behavior
BTW: 1==stop request, 3=Task Aborted/Failed
Which makes sense if it crashed on low memory...
Hi @<1523704198338711552:profile|RoughTiger69>
From this scenario can we assume the "selection" will be tagging the model manually?
Also, how would an human operator decide on the best model, that is what is the input to base the decision on?
First that is awesome to hear PanickyFish98 !
Can you send the full exception? You might be on to something...
2. Actually we thought of it, but could not find a use case, can you expand?
3. I'm not sure I follow, do you mean you expect the first execution to happen immediately?
I am running from noebook and cell has returned
Well the Task will close when you shut down the notebook 🙂
RoundMosquito25 actually you can 🙂# check the state every minute while an_optimizer.wait(timeout=1.0): running_tasks = an_optimizer.get_active_experiments() for task in running_tasks: task.get_last_scalar_metrics() # do something here
base line reference
https://github.com/allegroai/clearml/blob/f5700728837188d7d6005726c581c9d74fd91164/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L127
Hi @<1545216070686609408:profile|EnthusiasticCow4> let me know if this one solves the issue
pip install clearml==1.14.2rc0
yes, it worked. thank you very much.
ScantCrab97 nice!
. it was indeed a matter of the subnets....
BrightRabbit75 you are awesome, thank you!
(now we probably need to add it to the faq somewhere?!)
Oh right, I missed the fact the helper functions are also decorated, yes it makes sense we add the tags as well.
Regarding nested pipelines, I think my main question is , are they independent or are we generating everything from the same code base?
@<1540142651142049792:profile|BurlyHorse22> do you mean the one refereed in the video ? (I think this is the raw data in kaggle)
So you could change it down the road if infra/hosting changes.
Internally this is doable and Enterprise edition supports it, at the end this is stored in DBs 🙂
Also in this case, I'm uploading the data to the public file server URL, but my k8 pod can't reach that for security reasons.
Yes, this is solvable as well (again sorry for pointing it, but only in the enterprise version), where you can specify per client or globally:
` path_substitution = [
# Replace regis...
In theory it should have worked.
Can you send me the full Task log? (with cache and everything?)
I suspect since these are not the default folders, something is misconfigured / missing
(you can DM the log, so it won't end on a public the channel))
BTW,Â
 has this at the bottom:
Yes, it is the company legal entity name. But I think that for refrencing it makes more sense to mention the product name ClearML
I think this looks good 🙂
I still wonder how no one noticed ... (maybe 100 unique title/series report is relatively high threshold)
I was using clearml == 0.17.5 and I also had this issue
I think it was introduced when we moved to subprocess reporting, with 0.17.5
You can disable it with the following in clearml.conf:sdk.development.report_use_subprocess = false
WickedGoat98 nice!!
Can you also pass the login screen (i.e. can you access the api server)
This is odd, how are you spinning clearml-serving ?
You can also do it synchronously :
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
Wait, why aren't you just calling Popen? (or os.system), I'm not sure how it relates to the torch multiprocess example. What am I missing ?
HugeArcticwolf77 oh no, I think you are correct 😞
Do you want to quickly PR a fix ?
If it cannot find the Task ID I'm guessing it is trying to connect to the demo server and not your server (i.e. configuration is missing)
Ohh okay something seems to half work in terms of configuration, the agent has enough configuration to register itself, but fails to pass it to the task.
Can you test with the latest agent RC:0.17.2rc4
You need to mount it to ~/clearml.conf
(i.e. /root/clearml.conf)
So the way it works anything in the " extra_docker_shell_script
" section is executed inside the container everytime the container spins. I'm thinking that theextra_docker_shell_script
will pull the environment file from an S3 bucket and apply all "secrets" (or secrets are embedded into the startup bash script, like "export AWS_SECRET=abcdef"), that said this will not be on a per user basis 😞
Does that help?
DilapidatedDucks58 use a full link , without the package namegit+
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
I think there is force
argument, to force download.
I think the main issue is getting the size from different backends (i.e. s3 /https / etc.)
Maybe we should add it as a GitHub feature request issue?
The main limitation is that the driver "list()" does not return file size.
For example it might be an issue with the default http files-server.
wdyt?
Hi @<1600661423610925056:profile|StrongMouse81>
using serving base url and also other endpoint of model we add using:
clearml-serving model add
we get the attached respond:
And other model endpoints are working for you?
@<1545216077846286336:profile|DistraughtSquirrel81> shoot an email to "support@clear.ml" and provide all the information you can on the "lost account" (i.e. the one you had the data on), this means email account that created it (or your colleagues emails), and any other information that might help to locate it.