UpsetCrocodile10
Does this method expect
my_train_func
to be in the same file as
As long as you import it and you can pass it, it should work.
Child exp get's aborted immediately ...
It seems it cannot find the file "main.py" , it assumes all code is part of a single repository, is that the case ? What do you have under the "Execution" tab for the experiment ?
Thanks BitterStarfish58 !
ReassuredTiger98
will it then be used by the clearml-agent
Yes, I think that in order to make it work, you have to make sure that the agent is also running with TRAINS_LOG_ENVIRONMENT=MYVAR*
Notice that you can use wildcard or have a list of VARIABLE you allow wither the clearml or the agent to monitor / change.
3.a
Regarding the model query, sure from Python or restapi you can query based on any metadata
https://clear.ml/docs/latest/docs/references/sdk/model_model/#modelquery_modelsmodels
3.b
If you are using clearml-serving then check the docs / readme, but in a nutshell yes you can.
If the inference code is batchprocessing, which means a Task, then of course you can and lauch it, check the clearml agent f...
We're wondering how many on-premise machines we'd like to deprecate.
I think you can see that in the queues tab, no?
MelancholyElk85
How do I add files without uploading them anywhere?
The files themselves need to be packaged into a zip file (so we have an immutable copy of the dataset). This means you cannot "register" existing files (in your example, files on your S3 bucket?!). The idea is to make sure your dataset is protected against changes on the one hand, but on the other to allow you to change it, and only store the changeset.
Does that make sense ?
Hi DepressedChimpanzee34
if you try to extend it more then the width of the column to the right, it doesn't do anything..
You mean outside of the window? or are you saying you cannot extend it?
Just verifying, we are talking about the latest version of clearml-server ?
Hmm I tested on chromium and it seemed to work, let me see if I can reproduce it...
Hi CooperativeFox72
Sure 🙂task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
I managed to do it by using logger.report_scalar, thanks!
Sure, but for future reference where (in ignite callbacks) did you add the report_scalar
call ?
Hi DisgustedDove53
Is redis used as permanent data storage or just cache?
Mostly cache (Ithink)
Would there be any problems if it is restarted and comes up clean?
Pretty sure it should be fine, why do you ask ?
I should manually copy it to the remote services agents?
The code itself needs to run somewhere, currently this has to be your machine, either you manually run the AWS autoscaler or an agents runs it for you. Make sense ?
Exactly, just pointing to the fact that, that machine is yours ;)
JitteryCoyote63 How can I reproduce it quickly?
Could it be the credentials are actually incorrect? because it seems like you can access the server? (I assume you were able to browse to it and generate credentials. right?)
Which would mean the error is because of a company firewall/self-signed certificate.
The easiest solution,Disable SSL certificate check for ClearML.
Create the ~/clearml.conf manually:
` #disable SSL certificate check
api.verify_certificate: False
copy paste the credentials section from the UI
it should look something like:
api {
# web_server on port 8080
web_server: " "
# Notice: 'api_server' is the api server (default port 8008), not the web server.
api_server: ...
this is very odd, can you post the log?
Sure thing 🙂
BTW: ReassuredTiger98 this is definitely an interesting use case, and I think you can actually write some code to solve it if you like.
Basically let's followup on you setup:Machine X: agent listening to queue A, B_machine_a *notice we have two agents here Machine Y: agent listening to queue B_machine_b
Now we (the users) will push our jobs into queues A and B
Now we have a service that does the following:
` see if we have a job in queue B
check if machine Y is working...
but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
How so?
Hi ConvincingSwan15
A few background questions:
Where is the code that we want to optimize? Do you already have a Task of that code executed?
"find my learning script"
Could you elaborate ? is this connect to the first question ?
Does clearml resolve the CUDA Version from driver or conda?
Actually it starts with the default CUDA based on the host driver, but when it installs the conda env it takes it from the "installed packages" (i.e. the one you used to execute the code in the first place)
Regrading link, I could not find the exact version bu this is close enough I guess:
None
from the notebook run !ls ~/clearml.conf
I'm sorry JitteryCoyote63 No 😞
I do know that the enterprise addition have these features (a.k.a vault & permissions), basically to answer these types of situations.
Seems lime someone sitting in the middle and reroutes the request (maybe both https and port) ?!