Reputation
Badges 1
25 × Eureka!Hi GrittyHawk31
this one?
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server
Hi @<1544853721739956224:profile|QuizzicalFox36>
Sure just change the ports on the docker compose
Hi @<1556812486840160256:profile|SuccessfulRaven86>
it does not when I run a flask command inside my codebase. Is it an expected behavior? Do you have some workarounds for this?
Hmm where do you have your Task.init ?
(btw: what's the use case of a flask app tracking?)
Then I deleted those workers,
How did you delete those workers? the autoscaler is supposed to spin the ec2 instances down when they are idle, in theory there is no need for manual spin down.
Hi ContemplativeCockroach39
Seems like you are running the exact code as in the git repo:
Basically it points you to the exact repository https://github.com/allegroai/clearml and the script examples/reporting/pandas_reporting.py
Specifically:
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/reporting/pandas_reporting.py
MysteriousBee56 and please this one: "when you run theΒ trains-agent
Β with --foreground , before it starts the docker it print the full command line"
I assume every fit starts reporting from step 0 , so they override one another. Could it be?
can you get the agent to execute the task on the current conda env without setting up new environment?
Wouldn't that break easily ? Is this a way to avoid dockers, or a specific use case ?
is there any other way to get task from the queue running locally in the current conda env?
You mean including cloning the code etc. but not installing any python packages ?
Hi CurvedHedgehog15
User aborted: stopping task (3)
?
This means "someone" externally aborted the Task, in your case the HPO aborted it (the sophisticated HyperBand Bayesian optimization algorithms we use, both Optuna and HpBandster) will early stop experiments based on their performance and continue if they need later
docstring ?
Usually the preferred way is StorageManager
https://clear.ml/docs/latest/docs/references/sdk/storage
https://clear.ml/docs/latest/docs/integrations/storage
Hello guys, i have 4 workers (2 in default and 2 in service queue on same machine)
Hi @<1526734437587357696:profile|ShaggySquirrel23>
I think what happens is one agent is deleting it's cfg file when it is done, but at least in theory each one should have it's own cfg
One last request can you try with the agent's latest RC version 1.5.3rc2 ?
WackyRabbit7 this is funny, it is not ClearML providing this offering
some generic company grabbed the open-source and put t there, which they should not π
Open source defaults π
JitteryCoyote63
I am setting up a new machine with two rtx 3070 GPU
Nice! you are one of the lucky few who managed to buy them π
Which makes me think that the wrong torch package is installed
I think that torch 1.3.1 is does not support cuda 11 π
MistakenDragonfly51 just making sure I understand, on Your machine (the one running the pytorch example),
you have set " CLEARML_DEFAULT_OUTPUT_URI
" / configured the "clearml.conf" file with default_output_uri
, yet the model checkpoint was Not uploaded?
Hi UnsightlyHorse88
Hmm, try adding to your clearml.conf file:agent.cpu_only = true
if that does not work try adding to the OS environmentexport CLEARML_CPU_ONLY=1
GrittyHawk31 by default any user can login (i.e. no need for password), if you want user/password access:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config/#web-login-authentication
Notice no need to have anything else in the apiserver.conf
, just the user/pass section, everything else will just be the default values.
it should be fairly easy to write such a daemon
from clearml.backend_api.session.client import APIClient
client = APIClient()
timestamp = time() - 60 * 60 * 2 # last 2 hours
tasks = client.tasks.get_all(
status=["in_progress"],
only_fields=["id"],
order_by=["-last_update"],
page_size=100,
page=0,
created =[">{}".format(datetime.utcfromtimestamp(timestamp))],
)
...
references:
[None](https://clear.ml/...
Yes, but only with git clone π
It is not stored on ClearML, this way you can work with the experiment manager without explicitly giving away all your code π
A more detailed instructions:
https://github.com/allegroai/trains-agent#installing-the-trains-agent
Hi ColossalDeer61 ,
Xxx is the module where my main experiment script resides.
So I think there are two options,
Assuming you have a similar folder structure-main_folder
--package_folder
--script_folder
---script.py
Then if you set the "working directory" in the execution section to "." and the entry point to "script_folder/script.py", then your code could do:from package_folder import ABC
2. After cloning the original experiment, you can edit the "installed packages", and ad...
CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.
None
See: Add an experiment hyperparameter:
and add gpu
: True
Also I would suggest using Task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
StaleMole4 you are printing the values before Task.init had the chance to populate it.
Basically try moving the print after closing the Task (closing the tasks waits for the async update)
Make sense ?
GrievingTurkey78 short answer no π
Long answer, the files are stored as differentiable sets (think changes set from the previous version(s)) The collection of files is then compressed and stored as a single zip. The zip itself can be stored on Google but on their object storage (not the GDrive). Notice that the default storage for the clearml-data is the clearml-server, that said you can always mix and match (even between versions).
Hi @<1524922424720625664:profile|TartLeopard58>
canβt i embed scalars to notion using clearml sdk?
I think that you need the hosted version for it (it needs some special CORS stuff on the server side to make it work)
Did you try in the clearml report? does that work?
So actually while weβre at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?
The other way around- "8011:8008"