Reputation
Badges 1
25 × Eureka!ngrok to connect to the remote server at the office?
That makes sense, I guess this is the equivalent of using a VPN, from that point onward clearml-session can directly access the remote machine, right?
JitteryCoyote63 see here https://stackoverflow.com/questions/55385900/pip3-setup-py-install-requires-pep-508-git-url-for-private-repo bottom line, you have to add package@ before the link, but if you do that and the package is already installed it will not install using the git repo, this is an issue with pip. I think that since the agent installs everything from scratch it should work for you. Wdyt?
Exactly, thatβs my problem: I want to remove it to make sure it is reinstalled (because the version can change)
JitteryCoyote63 yes, this is definitely a pip bug... can you test with the latest pip version, maybe it was fixed? (i.e. git+https:// link)
With env caching enabled, it wonβt reinstall this private dependency, right?
It will, local packages (".") and git packages are alwyas reinstalled even if using venv caching, exactly for that reason π
Ohh so the setup.py is the one containing these requirements, oops I totally missed that :( let me check what pep has to say about that ... (Basically this is not a clearml issue but a pip one...)
Hi JitteryCoyote63
Or even better: would it be possible to have a support for HTML files as artifacts?
If you report html files as debug media they will be previewed, as long as the link is accessible.
You can check this example:
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
In the artifacts, I think html are also supported (maybe not previewed as nicely but clickable.
Regrading the s3 link, I think you are supposed to get a popup window as...
JitteryCoyote63 s3 should work, you can go to your profile page, see if you do not have some old credentials already there, maybe this is the issue.
Hi ReassuredTiger98
I do not want to create extra queues for this since this will not be able to properly distribute tasks.
Queues are the way to abstract different resources to "compute capabilities". It creates a simple interface to users on the one hand and allows you to control the compute on the other Agents can listen to multiple queues with priority. This means an RTX agent can pull from an RTX queue, and if this is empty, it will pull from "default" queueWould that work for ...
task.set_script(working_dir=dir, entry_point="my_script.py")
Why do you have this part? isn't it the same code, the script entry point is auto detected ?
... or when I run my_script.py locally (in order to create and enqueue the task)?
the latter, When the script is running locally
So something like
os.path.join(os.path.dirname(file), "requirements.txt")
is the right way?
Sure this will work π
but it is not possible to write to a private channel in which the bot is added.
Is this a Slack limitation ?
CheerfulGorilla72 sounds like a great idea, I'll pass along to documentation ppl π
we made two tb versions of / task and wrote in parallel.
And I wanted to know if it is possible here as well.
Basically you will have different series (based on the TB log file) on the same graph so you can compare π all automatically
I thought about the fact that maybe we need to write everything in one place
It will be in the same place, under the main Task
Should work out of the box
BTW: Basically just call Task.init(...)
the rest is magic π
JitteryCoyote63 What did you have in mind?
It's the safest way to run multiple processes and make sure they are cleaned afterwards ...
It will automatically switch to docker mode
ReassuredTiger98 there is an open issue on supporting bash script as pre run inside a docker (which will be supported in the next major release)
BTW: if you already have a docker file the fastest way would just to build the docker file and push it once, then you just specify the docker image:tag, this can be done a Task specific level.
It seems like there is no way to define that a Task requires docker support from an agent, right?
Correct, basically the idea is you either have workers working in venv mode or docker.
If you have a mixture of the two, then you can have the venv agents pulling from one queue (say default_venv) and the docker mode agents pulling from a different queue (say default_docker). This way you always know what you are getting when you enqueue your Task
I also found that you should have a deterministic ordering
before
you apply a fixed seed
Not sure I follow ?
HiΒ
, if you don't mind having a look too,
With pleasure :)
according to the above I was expecting the config to be auto-magically updated with the new yaml config I edited in the UI, however it seems like an additional step is required.. probably connect_dict? or am I missing something
Notice the OmegaConf section description :Full OmegaConf YAML configuration. This is a read-only section, unless 'Hydra/_allow_omegaconf_edit_' is set to True
By default it will alw...
Hi ScaryLeopard77
I think the error message you are getting is actually "passed" from Triton. Basically someone needs to tell it what the Model in/out look like (matrix size/type) this is essentially the content of the "config.pbtxt" , and this has to be set when spinning the model endpoint. does that make sense to you?
Pseudo-ish code:
create pipelinepipeline = Task.create(..., task_type="controller") pipeline.mark_started() print(pipeline.id)
2. launch step A (pass arguments via command line argument / os environment)
` task = Task.init(...)
pipeline_id = os.environ['MY_MAIN_PIPELINE']
pipeline_task = Task.get_task(task_id=pipeline_id)
send some metrics / reports etc.
pipeline_task.get_logger().report_scalar(...)
pipeline_task.get_logger().report_text(...) `wdyt? (obvioudly you need to somehow pass th...
Hi ScaryLeopard77
You can probably do:Task.init(...,continue_last_task='task_id_here')
This will continue a previously executed Task and log both steps in the same place.
Does that help?
BTW: you can also of course manually report to any Task as it is still running with:aux_task = Task.get_task(task_id_here) aux_task.get_logger().report_scalar(...)
Hi MinuteWalrus85
This is great question, and super important when training models. This is why we designed a whole system to manage datasets (including storage querying, balancing data, and caching). Unfortunately this is only available in the paid tier of Allegro... You are welcome to https://allegro.ai/enterprise/ the sales guys.
π
does this mean that Task stores --args (and propagates these further through the code as CLI arguments) somewhere where i can get and manipulate them from my code?
Yes it changes the actual argparse object and pushes the new values in runtime, basically you args.parse() will return the values from the UI (backend)
Yes this is a misleading title
I think the main issue is that for some reason the container running changed one of the files inside the temp folder. then the host machine is "stuck" with a file that the root user owned/changed, and now it cannot reuse / delete the temp folder.
I think the fix is to make sure the container deleted the temp folder when it is done