Reputation
Badges 1
25 × Eureka!Hi DepressedChimpanzee34
I think main issue here is slow response time from the API server, I "think" you can increase the number of API server processes, but considering the 16GB, I'm not sure you have the headroom.
At peak usage, how much free RAM so you have on the machine ?
Hi MelancholyElk85
So the way datasets now work, is they are actually an entity (folder) inside a project , all under TFW hidden .datasets sub project
This is so all data and tasks are both on the same project , but at the same time will not intersect with subprojects by the same name. Does that make sense?
It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?
SpotlessFish46 unless all the code is under "uncommitted changes" section, what you have is a link to the git repo + commit id
Not intentional! When I launched the AMI it was running an older version
I think this is exactly the reason they decided to change the location π so you will have to manually upgrade, reasoning is we changed directory names (maybe a few more things)
Yes shutdown the current docker copse curl the new docker compose rename folder spin it up againFull instructions here:
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_aws_ec2_ami.html#upgrading
But they are all running inside the same pod, correct ?
Hmm I think you have a point here, the confusing part is the cp cmd. Can you send the full log? (Regradless , can I assume you are running a rootless container ?)
UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)
if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
else:
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to h...
It reflects what is stored by Keras, so if Keras stores the best model this is what you get. BTW if you pass output_uri=True it will automatically upload the models
orchestration module
When you previously mention clone the Task I the UI and then run it, how do you actually run it?
regarding the exception stack
It's pointing to a stdout that was closed?! How could that be? Any chance you can provide a toy example for us to debug?
works seamlessly throughout and in our current on premise servers...
I'm assuming via something close to what I suggested above with .netrc ?
UnevenDolphin73fatal: could not read Username for '
': terminal prompts disabled .. fatal: clone of '
' into submodule path '/root/.clearml/vcs-cache/xxx.60db3666b11ac2df511a851e269817ef/xxx/xxx' failed
It seems it tries to clone a submodule and fails due to to missing keys for the submodule.
https://stackoverflow.com/questions/7714326/git-submodule-url-not-including-username
wdyt?
Most likely yes, but I don't see how clearml would have an impact here, I am more inclined to think it would be a pytorch dataloader issue, although I don't see why
These are most certainly dataloader process. But clearml-agent when killing the process should also kill all subprocesses, and it might be there is something going on that prenets it from killing the subprocesses ...
Is this easily reproducible ? Can you verify it is still the case with the latest RC of clearml-agent ?
Guys, any chance you can verify the RC solves the issue?pip install clearml==1.0.2rc0
So you mean 1.3.1 should fix this bug?
Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0
Should work in all cases, plotly/matplotlib/scalar_rerport
@<1610083503607648256:profile|DiminutiveToad80> try to turn on:
None
enable_git_ask_pass: true
Hi DepressedChimpanzee34 , took me a while but I think there is a solution:
In your docker file, replace:
https://github.com/allegroai/clearml-server/blob/a64c4d264d00eadd2d11818b37151d3cc6266d99/docker/docker-compose.yml#L5
withentrypoint: /bin/bash command: -c "mkdir -p /var/log/clearml && cd /opt/clearml/ && python3 -m apiserver.apierrors_generator && gunicorn -w 4 -t 600 --bind=0.0.0.0:8008 apiserver.server:app"
Hi VexedCat68
So if I understand correctly, the issue is this argument:parameter_override={'Args/dataset_id': '${split_dataset.split_dataset_id}', 'Args/model_id': '${get_latest_model_id.clearml_model_id}'},
I think that what is missing is telling it this an artifact:parameter_override={'Args/dataset_id': '${split_dataset.artifacts.split_dataset_id.url}', 'Args/model_id': '${get_latest_model_id.clearml_model_id}'},
You can see the example here:
https://clear.ml/docs/latest/docs/ref...
I can then programmatically choose which file to import with importlib. Is there a way to tell clearml programmatically to analyze the files, so it can built up the requirements correctly?
Sadly no π
It analyzes the running code, then if it decides it is not a self contained script it will analyze the entire repo ...
I just saw thatΒ
Task.create
Β takes
Task.create
is Not Task.init. It is meant to allow you to create new Tasks (think Jobs) from ...
LOL AlertBlackbird30 had a PR and pulled it π
Major release due next week after that we will put a a roadmap on the main GitHub page.
Anything specific you have in mind ?
any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?
I think there is force
argument, to force download.
I think the main issue is getting the size from different backends (i.e. s3 /https / etc.)
Maybe we should add it as a GitHub feature request issue?
The main limitation is that the driver "list()" does not return file size.
For example it might be an issue with the default http files-server.
wdyt?
Is task.parent something that could help?
Exactly π something like:# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)
I always have my notebooks in git repo but suddenly it's not running them correctly.
What do you mean?
Can I switch off git diff (change detection?)
Yes, Task.init(..., auto_connect_frameworks={"detect_repository": False})