
Reputation
Badges 1
25 × Eureka!I got everything working using the default queue. I can submit an experiment, and a new GPU node is provisioned, all good
Nice!
My next question, how do I add more queues?
You can create new queues in the UI and spin a new glue for the queue (basically think of a queue as an abstraction for a specific type of resource)
Make sense ?
Hi @<1686547344096497664:profile|ContemplativeArcticwolf43>
In the 2nd 'Getting Started' tutorial,
Could you send a link to the specific notebook?
. But whenever a task is picked, it fails for the following
You mean after the Task.init
call?
Hm, one of the issues I have with this change is that now every dataset hat doesnβt have a semantic version cannot be loaded anymore
Okay we definitely need to solve that.
Any chance I can ask to open a github issue (just so we do not forget).
I will pass it quickly along so that we can maybe offer a fix in the next RC
I would recommend reading this blog post, it should give you a glimpse of what can be built π
https://medium.com/pytorch/how-trigo-built-a-scalable-ai-development-deployment-pipeline-for-frictionless-retail-b583d25d0dd
The fact the html file does not refresh in the browser even though there is a new copy of it uploaded.
Could not locate channel name 'gg_clearml'
CheerfulGorilla72 these are the permissions:
https://github.com/allegroai/clearml/blob/427b98270cc846b5d7e4af49f9732e3eb8d7d3ae/examples/services/monitoring/slack_alerts.py#L13channels:join channels:read chat:write
@<1535793988726951936:profile|YummyElephant76> oh you mean like jupyter server was running, then inside the notebook you would start a new venv, in that venv "notebook" package was missing, hence it failed detecting the notebook ?
DeterminedCrab71 that is a good point, how does plotly adjust for nans on graphs?
Hi ShinyRabbit94
system_site_packages: true
This is set automatically when running in "docker mode" no need to worry π
What is exactly the error you are getting ?
Could it be the container itself has the python packages installed in a venv not as "system packages" ?
Hi @<1541954607595393024:profile|BattyCrocodile47>
Do you mean to start a remote session instead of the cli directly from the vscode ui and connect to it? If so, that would be awesome!! We have a remote session from the web were it spins you remote session and launches vscode inside the container so you work on it in your browser. But a VSCode plugin is a great idea, do you have a ref code to similar plugins?
Hi HelpfulDeer76
I mean that the task was being monitored on the demo ClearML server created by Allegro
Yes that is consistent with what I would expect to have happened
Basically if you are running it as k8s job, you can just configure the following environment variables:CLEARML_WEB_HOST:
CLEARML_API_HOST:
CLEARML_FILES_HOST:
CLEARML_API_ACCESS_KEY: <clearml access> CLEARML_API_SECRET_KEY: <clearml secret>
@<1787653555927126016:profile|SoggyDuck67> notice the binary
field in the Task "execution" tab, if for some reason it says "python3.10" it will try to use pytho 3.10 when running it.
That said if it does not find the request python version, it should output a warning and default to the python installed.
If you can provide the full log it will be helpful to see what happened there
we have some other parts, and for some cases we get initialization time can be about 10 times the experiment time
Before I dive into some agent in agent hacking, I would consider "caching" this preprocessing on an auxiliary Task as an artifact. Basically add another argument for the auxiliary Task, and fetch the data from it (obviously you will need to run it once before the optimizer launches the first experiment).
Now that is out of the way (which really would be the preferred engin...
Hi CleanPigeon16
Put the specific git into the "installed packages" section
It should look like:... git+
...
(No need for the specific commit, you can just take the latest)
I wonder if this hack would work
Assume you upload an artifact/model to ' s3://storage.yandexcloud.net:443/clearml-models ' notice the port is added. Would that trigger a popup in the UI?
Also what happens if you add tge credential manually in the profile page?
Is there a solution for that?
Hi DisturbedElk70
Well assuming you mount/sync the "temp" folder of the offline experiment to a storage solution, then have another process (on the other side), syncing these folders, it will work and you will get "real-time" updates π
Offline Folder:get_cache_dir() / 'offline' / task_id
Hi @<1704304350400090112:profile|UpsetOctopus60>
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_kubernetes_helm
Just use the helm charts. It's the easiest
If I access the dataset on the same location directly it works fine:
wait, I'm confused, how is it the datset us there? did it download the dataset?
are you saying this line for example will fail? (assuming you actually have a dataset by that name)
data_path = Dataset.get(dataset_name="002_Datenset_MASAM_for_fintuning", alias="002_Datenset_MASAM_for_fintuning").get_local_copy()
Sure go to the "All Projects" and filter by Task Type, application / service
I added the link just in case anywayΒ
Smart move :)
DilapidatedDucks58 , Of course there is π actually with the latest pip 20.1 and the next RC it will be automatically detected and put into "installed package"
You can treat the "installed packages" just like you would any other "requirements.txt", just add:git+
https://github.com/ ...
and you are good to go
Whoa, are you saying there's an autoscaler that
doesn't
use EC2 instances?...
Just to be clear the ClearML Autoscaler (aws) will spin instances up/down based on jobs in the queue it is listening to (the type of EC2 instances and configuration is fully configurable)
Oh!
I see this is using the colab as remote agent (i.e. to launch jobs on it),
[ColabKernelApp] CRITICAL | Bad config encountered during initialization: The 'kernel_class' trait of <main.ColabKernelApp object at 0x7fa41b29e5c0> instance must be a type, but 'google.colab._kernel.Kernel' could not be imported
Can you send the full log?
BTW
/home/local/user/.clearml/venvs-builds/3.7/bin/python: can't open file 'train.py': [Errno 2] No such file or directory
This error is from the agent, correct? it seems it did not clone the correct code, is train.py
committed to the repository ?
Hi @<1627478122452488192:profile|AdorableDeer85>
Are you referring to running the pipeline on a remote machine ? could you provide the full Task/Pipeline log ?
Ok,Β I think figured it out.
Nice!
ClearML doesn't add all the imported packages needed to run the task to the Installed Packages
It does (but not derivative packages, that are used by the required packages, the derivative packages will be added when the agent is running it, because it creates a new clean venv and then it add the required packages, then it updates back with everything in pip freeze, because it now represents All the packages the Task needs)
Two questions:
Is t...
Thanks @<1694157594333024256:profile|DisturbedParrot38> !
Nice catch.
Could you open a github issue so that at least we output a more informative error?
Oh, I was assuming you are passing the entire DB backups to the cloud.
Are you saying you just want the file server on the cloud ? if this is the case, I would just use S3
Hi UpsetCrocodile10
execute them and return scalars.
This should be a good start (I hope π )
` for child in children:
put the Task into an execution queue
Task.enqueue(child, queue_name='my_queue_here')
wait for the task to finish
child.wait_for_status(status=['completed'])
reload all the metrics
child.reload()
get the metrics
print(child.get_last_scalar_metrics()) `
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
Oh if this is the case, then by all means push it into your Task's docker_setup_bash_script
It does not seem to have to be done after the git clone, the only part the I can see is setting the PYTHONPATH to the additional repo you are pulling, and that should work.
The main hurdle might be passing credentials to git, but if you are using SSH it should be transparent
wdyt?